Skip to content Skip to sidebar Skip to footer

Collapsing Rows With Nan Entries In Pandas Dataframe

I have a pandas DataFrame with rows of data:: # objectID grade OS method object_id_0001 AAA Mac organic object_id_0001 AAA Mac NA object_id_0001 AA

Solution 1:

Quick and Dirty

This works and has for a long time. However, some claim that this is a bug that may be fixed. As it is currently implemented, first returns the first non-null element if it exists per column.

df.groupby('objectID', as_index=False).first()

         objectID grade   OS   method0  object_id_0001   AAA  Mac  organic
1  object_id_0002   ABC  Win      NaN

pd.concat

pd.concat([
    pd.DataFrame([d.lookup(d.notna().idxmax(), d.columns)], columns=d.columns)
    for _, d in df.groupby('objectID')
], ignore_index=True)

         objectID grade   OS   method0  object_id_0001   AAA  Mac  organic
1  object_id_0002   ABC  Win      NaN

stack

df.set_index('objectID').stack().groupby(level=[0, 1]).head(1).unstack()

               grade   OS   method
objectID                          
object_id_0001   AAA  Mac  organic
object_id_0002   ABC  Win     None

If by chance those are strings ('NA')

df.mask(df.astype(str).eq('NA')).groupby('objectID', as_index=False).first()

Solution 2:

One alternative, more mechanical way

def aggregate(s):
    u = s[s.notnull()].unique()
    ifnot u.size: return np.nan
    return u

df.groupby('objectID').agg(aggregate)

                grade   OS      method
objectID            
object_id_0001  AAA     Mac     organic
object_id_0002  ABC     Win     NaN

Solution 3:

This will work bfill+ drop_duplicates

df.groupby('objectID',as_index=False).bfill().drop_duplicates('objectID')
Out[939]: 
         objectID grade   OS   method0  object_id_0001   AAA  Mac  organic
3  object_id_0002   ABC  Win      NaN

Post a Comment for "Collapsing Rows With Nan Entries In Pandas Dataframe"