Skip to content Skip to sidebar Skip to footer

Wide To Long Dataset Using Pandas

There are a lot of questions out there with similar titles but I'm unable to solve the issues that I'm having with my dataset. Dataset: ID Country Type Region Gender IA01_Raw I

Solution 1:

This will get you started. The essence is using set_index, column conversion to MultiIndex, then stack. Better solutions exist, possibly, but I would do it this way because it is an easy step to your output.

# Set the index with columns that we don't want to "transpose"df2 = df.set_index([
   'ID', 'Country', 'Type', 'Region', 'Gender', 'QA_Include', 'QA_Comments'])
# Convert headers to MultiIndex -- this is so we can melt IA valuesdf2.columns = pd.MultiIndex.from_tuples(map(tuple, df2.columns.str.split('_')))
# Call stack to replicate data, then reset the indexout =  df2.stack(level=0).reset_index().rename({'level_7': 'IA'}, axis=1)

out

     ID Country Type  Region  Gender QA_Include  QA_Comments    IA  Class1  Class2 Raw
0   SC1  France    A  Europe    Male        yes          NaN  IA01       8141   SC1  France    A  Europe    Male        yes          NaN  IA02       41   J
2   SC2  France    A  Europe  Female        yes          NaN  IA01       7223   SC2  France    A  Europe  Female        yes          NaN  IA02       64Q4   SC3  France    B  Europe    Male        yes          NaN  IA01       7235   SC3  France    B  Europe    Male        yes          NaN  IA02       82   K
6   SC4  France    A  Europe    Male        yes          NaN  IA01       8247   SC4  France    A  Europe    Male        yes          NaN  IA02       21A8   SC5  France    B  Europe    Male        yes          NaN  IA01       7119   SC5  France    B  Europe    Male        yes          NaN  IA02       13   F
10  ID6  France    A  Europe    Male        yes          NaN  IA01       81211  ID6  France    A  Europe    Male        yes          NaN  IA02       37   R
12  ID7  France    B  Europe    Male        yes          NaN  IA01       81213  ID7  France    B  Europe    Male        yes          NaN  IA02       46Q14  UC8  France    B  Europe    Male        yes          NaN  IA01       82415  UC8  France    B  Europe    Male        yes          NaN  IA02       42P

Solution 2:

u can use pd.lreshape

pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)), 
            {'IA': ['IA01', 'IA02','IA09'],
             'Raw': ['IA01_Raw','IA02_Raw','IA09_Raw'], 
             'Class1': ['IA01_Class1','IA02_Class1','IA09_Class1'], 
             'Class2': ['IA01_Class2', 'IA02_Class2','IA09_Class2']
             })


edit : 

pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)), 
            {'IA': ['IA01', 'IA02','IA09'],
             'Raw': ['IA01_Raw_baseline','IA02_Raw_midline','IA09_Raw_whatever'], 
             'Class1': ['IA01_Class1_baseline','IA02_Class1_midline','IA09_Class1_whatever'], 
             'Class2': ['IA01_Class2_baseline', 'IA02_Class2_midline','IA09_Class2_whatever']
             })

edit: Just add column names of which ever columns you want from the input in Raw/Class1/Class2 column of the output to the list inside the dictionary

documentation for this is not available . use help(pd.lreshape) or refer here

Output:

    Country Gender  ID  QA_Comments QA_Include  Region  Type    IA  Raw Class1  Class2
0   France  Male    SC1 NaN         yes         Europe  A014811   France  Female  SC2 NaN         yes         Europe  A012722   France  Male    SC3 NaN         yes         Europe  B013723   France  Male    SC4 NaN         yes         Europe  A014824   France  Male    SC5 NaN         yes         Europe  B011715   France  Male    ID6 NaN         yes         Europe  A012816   France  Male    ID7 NaN         yes         Europe  B012817   France  Male    UC8 NaN         yes         Europe  B014828   France  Male    SC1 NaN         yes         Europe  A02  J   419   France  Female  SC2 NaN         yes         Europe  A02Q6410  France  Male    SC3 NaN         yes         Europe  B02  K   8211  France  Male    SC4 NaN         yes         Europe  A02A2112  France  Male    SC5 NaN         yes         Europe  B02  F   1313  France  Male    ID6 NaN         yes         Europe  A02  R   3714  France  Male    ID7 NaN         yes         Europe  B02Q4615  France  Male    UC8 NaN         yes         Europe  B02P4216  France  Male    SC1 NaN         yes         Europe  A09  W   6317  France  Female  SC2 NaN         yes         Europe  A09  X   5218  France  Male    SC3 NaN         yes         Europe  B09  Y   5519  France  Male    SC4 NaN         yes         Europe  A09P5220  France  Male    SC5 NaN         yes         Europe  B09  T   5221  France  Male    ID6 NaN         yes         Europe  A09I5222  France  Male    ID7 NaN         yes         Europe  B09A8223  France  Male    UC8 NaN         yes         Europe  B09  K   75

Post a Comment for "Wide To Long Dataset Using Pandas"