Merging Crosstabs In Python
Solution 1:
I just decided to give you a better way of getting you what you want:
I use df.groupby([col1, col2]).size().unstack()
to proxy as my pd.crosstab
as a general rule. You were trying to do a crosstab for every group of source
. I can fit that in nicely with my existing groupby with df.groupby([col1, col2, col3]).size().unstack([2, 1])
The sort_index(1).fillna(0).astype(int)
is just to pretty things up.
If you want to understand even better. Try the following things and look what you get:
df.groupby(['word', 'gender']).size()
df.groupby(['word', 'gender', 'source']).size()
unstack
and stack
are convenient ways to get things that were in the index into the columns instead and vice versa. unstack([2, 1])
is specifying the order in which index levels get unstacked.
Finally, I take my xtabs
and stack
again and sum across the rows and unstack
to prep to pd.concat
. Voilà !
xtabs = df.groupby(df.columns.tolist()).size() \
.unstack([2, 1]).sort_index(1).fillna(0).astype(int)
pd.concat([xtabs.stack().sum(1).rename('total').to_frame().unstack(), xtabs], axis=1)
Your Code should now look like this:
import pandas as pd
import numpy as np
import functools as ft
def main():
# Create dataframe
df = pd.DataFrame(data=np.zeros((0, 3)), columns=['word','gender','source'])
df["word"] = ('banana', 'banana', 'elephant', 'mouse', 'mouse', 'elephant', 'banana', 'mouse', 'mouse', 'elephant', 'ostrich', 'ostrich')
df["gender"] = ('a', 'the', 'the', 'a', 'the', 'the', 'a', 'the', 'a', 'a', 'a', 'the')
df["source"] = ('BE', 'BE', 'BE', 'NL', 'NL', 'NL', 'FR', 'FR', 'FR', 'FR', 'FR', 'FR')
return create_frequency_list(df)
def create_frequency_list(df):
xtabs = df.groupby(df.columns.tolist()).size() \
.unstack([2, 1]).sort_index(1).fillna(0).astype(int)
total = xtabs.stack().sum(1)
total.name = 'total'
total = total.to_frame().unstack()
return pd.concat([total, xtabs], axis=1)
main()
Post a Comment for "Merging Crosstabs In Python"