Pandas Apply Function To Data Grouped By Day

April 05, 2024 Post a Comment

I have a dataset that looks like this: date,value1,value2 2016-01-01 00:00:00,3,0 2016-01-01 01:00:00,0,0 2016-01-01 02:00:00,0,0 2016-01-01 03:00:00,0,0 2016-01-01 04:00:00,0,0 20

Solution 1:

use sklearns mean_squared_error

from sklearn.metrics import mean_squared_error

df.groupby(df.date.dt.date).apply(
    lambda x: mean_squared_error(x.value1, x.value2) ** .5)

date2016-01-013.4940432016-01-020.377964dtype: float64

Solution 2:

You do not need to keep redoing the groupby and you need to compute rmse on each element of it, not on the sequence of means:

gb = df.groupby(df.index.date)
mean_by_day = gb.mean()
rmse_by_day = gb.std(ddof=0)

I suspect that the RMSE formula you are applying is exactly equivalent to the standard deviation normalized by the number of elements (not the number of elements - 1, as is default in Pandas).

You should now be able to access mean_by_day.value1 and std_by_day.value1 to get the values that you want.

The value I get for mean_by_day is

value1value22016-01-01  5.4166676.5416672016-01-02  0.1250000.000000

Similarly, for rmse_by_day I get

value1value22016-01-01  5.1390396.4224812016-01-02  0.3307190.000000

Note that the date field of the index is used rather than day, which could be repeated if your data went on for multiple months.

Introduction to Python Course

Pandas Apply Function To Data Grouped By Day

Solution 1:

Solution 2:

Post a Comment for "Pandas Apply Function To Data Grouped By Day"