Skip to content Skip to sidebar Skip to footer

Remove/sum Duplicate Row With Pandas

I have this dataframe, How can i make condition that if i have a duplicate row if that they are exactly the same(Mercedes exp) I keep only one (without making the sum) Or make the

Solution 1:

Use DataFrame.drop_duplicates before aggregate sum - this looking for duplciates together in all columns:

df1 = df.drop_duplicates().groupby('cars', sort=False, as_index=False).sum()
print(df1)
       cars  rent  sale
0       Kia     571       Bmw     142  Mercedes     213      Ford     11

If need specify columns for check duplicates:

df1 = (df.drop_duplicates(['cars','rent','sale'])
         .groupby('cars', sort=False, as_index=False)
         .sum())

But if need remove duplciates separately for each column use lambda function with np.unique and sum:

df=pd.DataFrame({'cars':['Kia','Bmw','Mercedes','Ford','Kia','Mercedes'],
                'rent':[1,1,2,1,4,2],
                'sale':[2,4,1,1,5,5]})
print(df)
       cars  rent  sale
0       Kia     1     2
1       Bmw     1     4
2  Mercedes     2     1
3      Ford     1     1
4       Kia     4     5
5  Mercedes     2     5 <- changed 5

df2 = df.groupby('cars', sort=False, as_index=False).agg(lambda x: np.unique(x).sum())
print(df2)
       cars  rent  sale
0       Kia     5     7
1       Bmw     1     4
2  Mercedes     2     6
3      Ford     1     1

Solution 2:

df['duplicated']=df.duplicated()  # create a column with the info of duplicating 
row or not.
df = df[~df['duplicated'].isin([True])] # delete duplicated row.
df.drop('duplicated', inplace=True, axis=1) # delete the column that we added.df=df.groupby(['cars'], sort=False).sum().reset_index() # group the dataframe.

you can do like this too

Post a Comment for "Remove/sum Duplicate Row With Pandas"