Skip to content Skip to sidebar Skip to footer

Pandas Apply Function To Data Grouped By Day

I have a dataset that looks like this: date,value1,value2 2016-01-01 00:00:00,3,0 2016-01-01 01:00:00,0,0 2016-01-01 02:00:00,0,0 2016-01-01 03:00:00,0,0 2016-01-01 04:00:00,0,0 20

Solution 1:

use sklearns mean_squared_error

from sklearn.metrics import mean_squared_error

df.groupby(df.date.dt.date).apply(
    lambda x: mean_squared_error(x.value1, x.value2) ** .5)

date2016-01-013.4940432016-01-020.377964dtype: float64

Solution 2:

You do not need to keep redoing the groupby and you need to compute rmse on each element of it, not on the sequence of means:

gb = df.groupby(df.index.date)
mean_by_day = gb.mean()
rmse_by_day = gb.std(ddof=0)

I suspect that the RMSE formula you are applying is exactly equivalent to the standard deviation normalized by the number of elements (not the number of elements - 1, as is default in Pandas).

You should now be able to access mean_by_day.value1 and std_by_day.value1 to get the values that you want.

The value I get for mean_by_day is

value1value22016-01-01  5.4166676.5416672016-01-02  0.1250000.000000

Similarly, for rmse_by_day I get

value1value22016-01-01  5.1390396.4224812016-01-02  0.3307190.000000

Note that the date field of the index is used rather than day, which could be repeated if your data went on for multiple months.

Post a Comment for "Pandas Apply Function To Data Grouped By Day"