Pandas Apply Function To Data Grouped By Day
I have a dataset that looks like this: date,value1,value2 2016-01-01 00:00:00,3,0 2016-01-01 01:00:00,0,0 2016-01-01 02:00:00,0,0 2016-01-01 03:00:00,0,0 2016-01-01 04:00:00,0,0 20
Solution 1:
use sklearn
s mean_squared_error
from sklearn.metrics import mean_squared_error
df.groupby(df.date.dt.date).apply(
lambda x: mean_squared_error(x.value1, x.value2) ** .5)
date2016-01-013.4940432016-01-020.377964dtype: float64
Solution 2:
You do not need to keep redoing the groupby
and you need to compute rmse
on each element of it, not on the sequence of means:
gb = df.groupby(df.index.date)
mean_by_day = gb.mean()
rmse_by_day = gb.std(ddof=0)
I suspect that the RMSE formula you are applying is exactly equivalent to the standard deviation normalized by the number of elements (not the number of elements - 1, as is default in Pandas).
You should now be able to access mean_by_day.value1
and std_by_day.value1
to get the values that you want.
The value I get for mean_by_day
is
value1value22016-01-01 5.4166676.5416672016-01-02 0.1250000.000000
Similarly, for rmse_by_day
I get
value1value22016-01-01 5.1390396.4224812016-01-02 0.3307190.000000
Note that the date
field of the index is used rather than day
, which could be repeated if your data went on for multiple months.
Post a Comment for "Pandas Apply Function To Data Grouped By Day"