Skip to content Skip to sidebar Skip to footer

What Does These Pandas Group By Statement Does?

I am following a tutorial on how to build a recommender system and came upon this line users_interactions_count_df = interactions_df.groupby(['personId', 'contentId']).size().group

Solution 1:

Check this sample data:

interactions_df = pd.DataFrame({
         'personId':list('XXYYWZWZ'),
         'contentId':list('aaaabbaa')
})

print (interactions_df)
  personId contentId
0        X         a
1        X         a
2        Y         a
3        Y         a
4        W         b
5        Z         b
6        W         a
7        Z         a

First get count per columns personId and contentId:

print (interactions_df.groupby(['personId', 'contentId']).size())
personId  contentId
W         a1b1
X         a2
Y         a2
Z         a1b1
dtype: int64

And then count by first level of MultiIndex created by personId column:

print (interactions_df.groupby(['personId', 'contentId']).size().groupby('personId').size())
personId
W    2
X    1
Y    1
Z    2
dtype: int64

Post a Comment for "What Does These Pandas Group By Statement Does?"