Quick function to drop duplicated columns in Pandas DataFrame

Update: If the dataframe is purly numeric, one easier way to drop duplicated columns is to use numpy.unique, sample code as follows:

Begin of original post:

Pandas has a nice function that will check and drop duplicated rows for a given data frame, but it can not work for dropping duplicated columns directly. A quick walkaround is to transpose the data frame first, drop duplicated rows and then transpose again. However, when the size of the data frame gets larger, not only this method took a long time, it eventually will break. I tried it on a data frame of size 100,000 by 41 and got a runtime error.

Below is a little function I wrote to find and drop duplicated columns of Pandas data frame.

2 thoughts on “Quick function to drop duplicated columns in Pandas DataFrame

Leave a Reply

Your email address will not be published. Required fields are marked *