Skip to content Skip to sidebar Skip to footer

Merging Crosstabs In Python

I am trying to merge multiple crosstabs into a single one. Note that the data provided is obviously only for test purposes. The actual data is much larger so efficiency is quite im

Solution 1:

I just decided to give you a better way of getting you what you want:

I use df.groupby([col1, col2]).size().unstack() to proxy as my pd.crosstab as a general rule. You were trying to do a crosstab for every group of source. I can fit that in nicely with my existing groupby with df.groupby([col1, col2, col3]).size().unstack([2, 1])

The sort_index(1).fillna(0).astype(int) is just to pretty things up.

If you want to understand even better. Try the following things and look what you get:

  • df.groupby(['word', 'gender']).size()
  • df.groupby(['word', 'gender', 'source']).size()

unstack and stack are convenient ways to get things that were in the index into the columns instead and vice versa. unstack([2, 1]) is specifying the order in which index levels get unstacked.

Finally, I take my xtabs and stack again and sum across the rows and unstack to prep to pd.concat. Voilà !

xtabs = df.groupby(df.columns.tolist()).size() \
          .unstack([2, 1]).sort_index(1).fillna(0).astype(int)

pd.concat([xtabs.stack().sum(1).rename('total').to_frame().unstack(), xtabs], axis=1)

enter image description here

Your Code should now look like this:

import pandas as pd
import numpy as np
import functools as ft

def main():
    # Create dataframe
    df = pd.DataFrame(data=np.zeros((0, 3)), columns=['word','gender','source'])
    df["word"] = ('banana', 'banana', 'elephant', 'mouse', 'mouse', 'elephant', 'banana', 'mouse', 'mouse', 'elephant', 'ostrich', 'ostrich')
    df["gender"] = ('a', 'the', 'the', 'a', 'the', 'the', 'a', 'the', 'a', 'a', 'a', 'the')
    df["source"] = ('BE', 'BE', 'BE', 'NL', 'NL', 'NL', 'FR', 'FR', 'FR', 'FR', 'FR', 'FR')

    return create_frequency_list(df)

def create_frequency_list(df):
    xtabs = df.groupby(df.columns.tolist()).size() \
              .unstack([2, 1]).sort_index(1).fillna(0).astype(int)

    total = xtabs.stack().sum(1)
    total.name = 'total'
    total = total.to_frame().unstack()

    return pd.concat([total, xtabs], axis=1)

main()

Post a Comment for "Merging Crosstabs In Python"