Skip to content Skip to sidebar Skip to footer

Resampling Boolean Values In Pandas

I have run into a property which I find peculiar about resampling Booleans in pandas. Here is some time series data: import pandas as pd import numpy as np dr = pd.date_range('01

Solution 1:

Well, tracking down shows that:

df.resample('5H')['Bools'].sum == Groupby.sum (in pd.core.groupby.generic.SeriesGroupBy)
df.resample('5H').sum == sum (in pandas.core.resample.DatetimeIndexResampler)

and tracking groupby_function in groupby.py shows that it's equivalent to r.agg(lambda x: np.sum(x, axis=r.axis)) where r = df.resample('5H') which outputs:

BoolsNumsNums22020-01-01 05:00:00      210102020-01-01 10:00:00      23535

well, actually, it should've been r = df.resample('5H')['Bool'] (only for the case above)

and tracking down the _downsample function in resample.py shows that it's equivalent to: df.groupby(r.grouper, axis=r.axis).agg(np.sum) which outputs:

NumsNums22020-01-01 05:00:00    10102020-01-01 10:00:00    3535

Solution 2:

df.resample('5H').sum() doesn't work on Bools column because the column has mixed data type, which is object in pandas. When calling sum() on resample or groupby, object typed columns will be ignored.

Post a Comment for "Resampling Boolean Values In Pandas"