Skip to content Skip to sidebar Skip to footer

Merging Multiple Pandas Datasets With Non-unique Index

I have several similarly structured pandas dataframes stored in a dictionary. I access a dataframe in the following way. ex_dict[df1] date df1price1 df1price2 10-20-2015

Solution 1:

You can use a concat followed by a groupby('date') to flatten the result.

In [22]: pd.concat([df1,df2,df3]).groupby('date').max()
Out[22]:
            df1price1  df1price2  df2price1  df2price2  df3price1  df3price2
date
10-20-201510015011014010015010-21-20159010090110NaNNaN10-22-2015100140NaNNaN9010010-23-2015NaNNaN11012080130

Edit: As BrenBarn points out in the comments, you can use concat(axis=1) if you set the join column as the index of your dataframes:

df1.index = df1.date
df2.index = df2.date
df3.index = df3.date

In [44]: pd.concat([df1,df2,df3],axis=1)
Out[44]:
                  date  df1price1  df1price2        date  df2price1  \
10-20-201510-20-201510015010-20-201511010-21-201510-21-20159010010-21-20159010-22-201510-22-2015100140         NaN        NaN
10-23-2015         NaN        NaN        NaN  10-23-2015110

            df2price2        date  df3price1  df3price2
10-20-201514010-20-201510015010-21-2015110         NaN        NaN        NaN
10-22-2015        NaN  10-22-20159010010-23-201512010-23-201580130

Solution 2:

You could use multiple merge on date column:

df1.merge(df2, on='date', how='outer').merge(df3, on='date', how='outer').set_index('date')

In [107]: df1.merge(df2, on='date', how='outer').merge(df3, on='date', how='outer').set_index('date')
Out[107]:
            df1price1  df1price2  df2price1  df2price2  df3price1  df3price2
date
10-20-201510015011014010015010-21-20159010090110        NaN        NaN
10-22-2015100140        NaN        NaN         9010010-23-2015        NaN        NaN        11012080130

Some explanation: First you merging df1 and df2 on column date with joining outer. The the resulted dataframe you merging with df3 with the same attributes. And finnaly setting index date for your resulted dateframe. If your dataframes have date columns as index you could first do reset_index for each of them and merge on the column name containing date

Post a Comment for "Merging Multiple Pandas Datasets With Non-unique Index"