Skip to content Skip to sidebar Skip to footer

Python Pandas - Find Consecutive Group With Max Aggregate Values

I have a data frame with datetimes and integers import numpy as np import pandas as pd df = pd.DataFrame() df['dt'] = pd.date_range('2017-01-01 12:00', '2017-01-01 12:30', freq='1

Solution 1:

You could use np.convolve to get the correct starting index and go from there.

defcons_max(df, N):
    max_loc = np.convolve(df.val, np.ones(N, dtype=int), mode='valid').argmax()
    return df.loc[max_loc:max_loc+N-1]

Demo

>>>cons_max(df,3)dtval52017-01-01 12:05:00   9462017-01-01 12:06:00   4272017-01-01 12:07:00   97>>>cons_max(df,5)dtval42017-01-01 12:04:00   7052017-01-01 12:05:00   9462017-01-01 12:06:00   4272017-01-01 12:07:00   9782017-01-01 12:08:00   12

This works be effectively "sliding" the kernel (array of ones) across our input and multiply-accumulating the elements in our window of size N together.

Solution 2:

You could use rolling/sum and np.nanargmax to find the index associated with the first occurrence of the maximum value:

import numpy as np
import pandas as pd

df = pd.DataFrame({'dt': ['2017-01-01 12:00:00', '2017-01-01 12:01:00', '2017-01-01 12:02:00', '2017-01-01 12:03:00', '2017-01-01 12:04:00', '2017-01-01 12:05:00', '2017-01-01 12:06:00', '2017-01-01 12:07:00', '2017-01-01 12:08:00', '2017-01-01 12:09:00', '2017-01-01 12:10:00', '2017-01-01 12:11:00', '2017-01-01 12:12:00', '2017-01-01 12:13:00', '2017-01-01 12:14:00', '2017-01-01 12:15:00'], 'val': [33, 42, 44, 6, 70, 94, 42, 97, 12, 11, 66, 71, 25, 23, 39, 25]})
df.index = df.index*10

N = 3
idx = df['val'].rolling(window=N).sum()
i = np.nanargmax(idx) + 1print(df.iloc[i-N : i])

prints

dtval502017-01-01 12:05:00   94602017-01-01 12:06:00   42702017-01-01 12:07:00   97

iloc uses ordinal indexing. loc uses label-based indexing. Provided that both i-N and i are valid indices, df.iloc[i-N : i] will grab a window (sub-DataFrame) of length N. In contrast, df.loc[i-N, i] will only grab a window of length N if the index uses consecutive integers. The example above shows a DataFrame where df.loc would not work since df.index has non-consecutive integer values.

Solution 3:

For simple single values, you can use something like:

df['total'] = df.val + df.val.shift(-1) + df.val.shift(-2)
first = df.dropna().sort('total').index[-1]
df.iloc[first:first+3]

Not sure how to generalize this... with most things pandas, there is probably an easier way, but this does work.

Edit: After a little more work, it looks like rolling is what you want:

last = df.val.rolling(3).sum().dropna().sort_values().index[-1]

This is a slightly different, in that the index you get here is the end, so after doing the above you want to do

df.iloc[last-2:last+1]

I think that could be generalized.

Post a Comment for "Python Pandas - Find Consecutive Group With Max Aggregate Values"