Skip to content Skip to sidebar Skip to footer

Get Lowest Value After Groupby - Pandas

I have a table with the following format: data = {'City' : ['London', 'Paris', 'Paris','NY' 'London'], 'Distance' : [5, 1, 7, 2, 6]} df = pd.DataFrame(data) df City Distanc

Solution 1:

You need DataFrameGroupBy.idxmin for indexes of minimal Distance per group and then select rows by loc:

df1 = df.loc[df.groupby('City', sort=False)['Distance'].idxmin()]
print (df1)
     City  Distance
0  London         5
1   Paris         1
3      NY         2

Detail:

print (df.groupby('City', sort=False)['Distance'].idxmin())
City
London    0
Paris     1
NY        3
Name: Distance, dtype: int64

Solution 2:

Sometime groupby is unnecessary, try drop_duplicates

df.sort_values('Distance').drop_duplicates('City')

Out[377]: 
     CityDistance0London51Paris13NY2

Solution 3:

You can use

>>>df.groupby(['City'], sort=False)['Distance'].min()
City
London    5
Paris     1
NY        2
Name: Distance, dtype: int64

Solution 4:

My opinion is that @jezrael offers the most idiomatic approach within a groupby. I've offered the same solution myself on other answers. However, here are some other alternatives.

Option 1 Use pd.DataFrame.nsmallest within an apply This offers clean logic even if the api is a bit clumsy. I think this version of nsmallest should be available to the groupby object. But as of pandas 0.20.3, it is not. So we use it within the general purpose apply method. Make sure to use group_keys=False in the call to groupby in order to avoid awkward additional indices.

df.groupby('City', group_keys=False).apply(
    lambda d: d.nsmallest(1, columns='Distance'))

     City  Distance
0  London         53      NY         21   Paris         1

Option 2 Was taken by @Wen so I deleted.

Post a Comment for "Get Lowest Value After Groupby - Pandas"