Resampling Boolean Values In Pandas
I have run into a property which I find peculiar about resampling Booleans in pandas. Here is some time series data: import pandas as pd import numpy as np dr = pd.date_range('01
Solution 1:
Well, tracking down shows that:
df.resample('5H')['Bools'].sum == Groupby.sum (in pd.core.groupby.generic.SeriesGroupBy)
df.resample('5H').sum == sum (in pandas.core.resample.DatetimeIndexResampler)
and tracking groupby_function
in groupby.py shows that it's equivalent to
r.agg(lambda x: np.sum(x, axis=r.axis))
where r = df.resample('5H')
which outputs:
BoolsNumsNums22020-01-01 05:00:00 210102020-01-01 10:00:00 23535
well, actually, it should've been r = df.resample('5H')['Bool']
(only for the case above)
and tracking down the _downsample
function in resample.py shows that it's equivalent to:
df.groupby(r.grouper, axis=r.axis).agg(np.sum)
which outputs:
NumsNums22020-01-01 05:00:00 10102020-01-01 10:00:00 3535
Solution 2:
df.resample('5H').sum()
doesn't work on Bools
column because the column has mixed data type, which is object
in pandas. When calling sum()
on resample
or groupby
, object
typed columns will be ignored.
Post a Comment for "Resampling Boolean Values In Pandas"