Wide To Long Dataset Using Pandas
There are a lot of questions out there with similar titles but I'm unable to solve the issues that I'm having with my dataset. Dataset: ID Country Type Region Gender IA01_Raw I
Solution 1:
This will get you started. The essence is using set_index
, column conversion to MultiIndex, then stack
. Better solutions exist, possibly, but I would do it this way because it is an easy step to your output.
# Set the index with columns that we don't want to "transpose"df2 = df.set_index([
'ID', 'Country', 'Type', 'Region', 'Gender', 'QA_Include', 'QA_Comments'])
# Convert headers to MultiIndex -- this is so we can melt IA valuesdf2.columns = pd.MultiIndex.from_tuples(map(tuple, df2.columns.str.split('_')))
# Call stack to replicate data, then reset the indexout = df2.stack(level=0).reset_index().rename({'level_7': 'IA'}, axis=1)
out
ID Country Type Region Gender QA_Include QA_Comments IA Class1 Class2 Raw
0 SC1 France A Europe Male yes NaN IA01 8141 SC1 France A Europe Male yes NaN IA02 41 J
2 SC2 France A Europe Female yes NaN IA01 7223 SC2 France A Europe Female yes NaN IA02 64Q4 SC3 France B Europe Male yes NaN IA01 7235 SC3 France B Europe Male yes NaN IA02 82 K
6 SC4 France A Europe Male yes NaN IA01 8247 SC4 France A Europe Male yes NaN IA02 21A8 SC5 France B Europe Male yes NaN IA01 7119 SC5 France B Europe Male yes NaN IA02 13 F
10 ID6 France A Europe Male yes NaN IA01 81211 ID6 France A Europe Male yes NaN IA02 37 R
12 ID7 France B Europe Male yes NaN IA01 81213 ID7 France B Europe Male yes NaN IA02 46Q14 UC8 France B Europe Male yes NaN IA01 82415 UC8 France B Europe Male yes NaN IA02 42P
Solution 2:
u can use pd.lreshape
pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)),
{'IA': ['IA01', 'IA02','IA09'],
'Raw': ['IA01_Raw','IA02_Raw','IA09_Raw'],
'Class1': ['IA01_Class1','IA02_Class1','IA09_Class1'],
'Class2': ['IA01_Class2', 'IA02_Class2','IA09_Class2']
})
edit :
pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)),
{'IA': ['IA01', 'IA02','IA09'],
'Raw': ['IA01_Raw_baseline','IA02_Raw_midline','IA09_Raw_whatever'],
'Class1': ['IA01_Class1_baseline','IA02_Class1_midline','IA09_Class1_whatever'],
'Class2': ['IA01_Class2_baseline', 'IA02_Class2_midline','IA09_Class2_whatever']
})
edit: Just add column names
of which ever columns you want from the input in Raw/Class1/Class2
column of the output to the list inside the dictionary
documentation for this is not available . use help(pd.lreshape)
or refer here
Output:
Country Gender ID QA_Comments QA_Include Region Type IA Raw Class1 Class2
0 France Male SC1 NaN yes Europe A014811 France Female SC2 NaN yes Europe A012722 France Male SC3 NaN yes Europe B013723 France Male SC4 NaN yes Europe A014824 France Male SC5 NaN yes Europe B011715 France Male ID6 NaN yes Europe A012816 France Male ID7 NaN yes Europe B012817 France Male UC8 NaN yes Europe B014828 France Male SC1 NaN yes Europe A02 J 419 France Female SC2 NaN yes Europe A02Q6410 France Male SC3 NaN yes Europe B02 K 8211 France Male SC4 NaN yes Europe A02A2112 France Male SC5 NaN yes Europe B02 F 1313 France Male ID6 NaN yes Europe A02 R 3714 France Male ID7 NaN yes Europe B02Q4615 France Male UC8 NaN yes Europe B02P4216 France Male SC1 NaN yes Europe A09 W 6317 France Female SC2 NaN yes Europe A09 X 5218 France Male SC3 NaN yes Europe B09 Y 5519 France Male SC4 NaN yes Europe A09P5220 France Male SC5 NaN yes Europe B09 T 5221 France Male ID6 NaN yes Europe A09I5222 France Male ID7 NaN yes Europe B09A8223 France Male UC8 NaN yes Europe B09 K 75
Post a Comment for "Wide To Long Dataset Using Pandas"