Skip to content Skip to sidebar Skip to footer

Grouby And Fill Missing Months In Multiple Columns Data Frame In Python

For a data frame like this, how could I group by id and fill missing months while keep price of missing months as na, the expected date range is from 2015/1/1 to 2019/8/1. city

Solution 1:

EDIT:

In real data is necessary unique values per columns city,district,id, date:

df = df.groupby(['city','district','id', 'date'], as_index=False)['price'].sum()

If need grouping by id column:

rng = pd.date_range('2015-01-01','2019-08-01', freq='MS')
df['date'] = pd.to_datetime(df['date'])

df1 = (df.set_index('date')
         .groupby('id')
         .apply(lambda x: x.reindex(rng))
         .rename_axis(('id','date'))
         .drop('id', axis=1)
         .reset_index()
        )
print (df1)

        iddate city district  price
0    20101 2015-01-01  NaN      NaN    NaN
1    20101 2015-02-01  NaN      NaN    NaN
2    20101 2015-03-01  NaN      NaN    NaN
3    20101 2015-04-01  NaN      NaN    NaN
4    20101 2015-05-01  NaN      NaN    NaN
..     ...        ...  ...      ...    ...
163  20103 2019-04-01  NaN      NaN    NaN
164  20103 2019-05-01  NaN      NaN    NaN
165  20103 2019-06-01  NaN      NaN    NaN
166  20103 2019-07-01  NaN      NaN    NaN
167  20103 2019-08-01  NaN      NaN    NaN

[168 rows x 5 columns]

Also if need grouping by more columns:

rng = pd.date_range('2015-01-01','2019-08-01', freq='MS')
df['date'] = pd.to_datetime(df['date'])

df2 = (df.set_index('date')
         .groupby(['city','district','id'])['price']
         .apply(lambda x: x.reindex(rng, fill_value=0))
         .rename_axis(('city','district','id','date'))
         .reset_index()
        )
print (df2)

    city district     iddate  price
0     hz       sn  20101 2015-01-01    0.0
1     hz       sn  20101 2015-02-01    0.0
2     hz       sn  20101 2015-03-01    0.0
3     hz       sn  20101 2015-04-01    0.0
4     hz       sn  20101 2015-05-01    0.0
..   ...      ...    ...        ...    ...
219   xz       pd  20103 2019-04-01    0.0
220   xz       pd  20103 2019-05-01    0.0
221   xz       pd  20103 2019-06-01    0.0
222   xz       pd  20103 2019-07-01    0.0
223   xz       pd  20103 2019-08-01    0.0

[224 rows x 5 columns]

Solution 2:

Using reindex with MS which is month start and pd.concat with GroupBy:

dates = pd.date_range('2015-01-01','2019-08-01', freq='MS')

new = pd.concat([
    d.set_index('date').reindex(dates).reset_index().rename(columns={'index':'date'}) for _, d in df.groupby('id')
], ignore_index=True)

new = new.ffill().bfill()

Output

datecitydistrictidprice02015-01-01   hzsn20101.02.212015-02-01   hzsn20101.02.222015-03-01   hzsn20101.02.232015-04-01   hzsn20101.02.242015-05-01   hzsn20101.02.2.................1632019-04-01   xzpd20103.03.11642019-05-01   xzpd20103.03.11652019-06-01   xzpd20103.03.11662019-07-01   xzpd20103.03.11672019-08-01   xzpd20103.03.1

[168rowsx5columns]

Post a Comment for "Grouby And Fill Missing Months In Multiple Columns Data Frame In Python"