Pandas: Get String Value With Most Occurrence In Group
I have the following DataFrame: item response 1 A 1 A 1 B 2 A 2 A I want to add a column with the most given respon
Solution 1:
There is pd.Series.mode
:
df.groupby('item').response.transform(pd.Series.mode)
Out[28]:
0A1A2A3C4CName: response, dtype: object
Solution 2:
Use value_counts
and return first index value:
df["responseCount"] = (df.groupby("item")["response"]
.transform(lambda x: x.value_counts().index[0]))
print (df)
item response responseCount
0 1 A A
1 1 A A
2 1 B A
3 2 C C
4 2 C C
Or collections.Counter.most_common
:
from collections import Counter
df["responseCount"] = (df.groupby("item")["response"]
.transform(lambda x: Counter(x).most_common(1)[0][0]))
print (df)
item response responseCount
01 A A
11 A A
21 B A
32 C C
42 C C
EDIT:
Problem is with one or multiple NaN
s only groups, solution is filter with if-else
:
print (df)
item response
01 A
11 A
22 NaN
32 NaN
43 NaN
def f(x):
s = x.value_counts()
print (s)
A 2
Name: 1, dtype: int64
Series([], Name: 2, dtype: int64)
Series([], Name: 3, dtype: int64)
#return np.nan if s.empty else s.index[0]
return np.nan iflen(s) == 0else s.index[0]
df["responseCount"] = df.groupby("item")["response"].transform(f)
print (df)
item response responseCount
01 A A
11 A A
22 NaN NaN
32 NaN NaN
43 NaN NaN
Solution 3:
You can use statistics.mode
from standard library:
from statistics import mode
df['mode'] = df.groupby('item')['response'].transform(mode)
print(df)
item response mode
0 1 A A
1 1 A A
2 1 B A
3 2 C C
4 2 C C
Post a Comment for "Pandas: Get String Value With Most Occurrence In Group"