Remove/sum Duplicate Row With Pandas
I have this dataframe, How can i make condition that if i have a duplicate row if that they are exactly the same(Mercedes exp) I keep only one (without making the sum) Or make the
Solution 1:
Use DataFrame.drop_duplicates
before aggregate sum
- this looking for duplciates together in all columns:
df1 = df.drop_duplicates().groupby('cars', sort=False, as_index=False).sum()
print(df1)
cars rent sale
0 Kia 571 Bmw 142 Mercedes 213 Ford 11
If need specify columns for check duplicates:
df1 = (df.drop_duplicates(['cars','rent','sale'])
.groupby('cars', sort=False, as_index=False)
.sum())
But if need remove duplciates separately for each column use lambda function with np.unique
and sum
:
df=pd.DataFrame({'cars':['Kia','Bmw','Mercedes','Ford','Kia','Mercedes'],
'rent':[1,1,2,1,4,2],
'sale':[2,4,1,1,5,5]})
print(df)
cars rent sale
0 Kia 1 2
1 Bmw 1 4
2 Mercedes 2 1
3 Ford 1 1
4 Kia 4 5
5 Mercedes 2 5 <- changed 5
df2 = df.groupby('cars', sort=False, as_index=False).agg(lambda x: np.unique(x).sum())
print(df2)
cars rent sale
0 Kia 5 7
1 Bmw 1 4
2 Mercedes 2 6
3 Ford 1 1
Solution 2:
df['duplicated']=df.duplicated() # create a column with the info of duplicating
row or not.
df = df[~df['duplicated'].isin([True])] # delete duplicated row.
df.drop('duplicated', inplace=True, axis=1) # delete the column that we added.df=df.groupby(['cars'], sort=False).sum().reset_index() # group the dataframe.
you can do like this too
Post a Comment for "Remove/sum Duplicate Row With Pandas"