Python Pandas - Find Consecutive Group With Max Aggregate Values
Solution 1:
You could use np.convolve
to get the correct starting index and go from there.
defcons_max(df, N):
max_loc = np.convolve(df.val, np.ones(N, dtype=int), mode='valid').argmax()
return df.loc[max_loc:max_loc+N-1]
Demo
>>>cons_max(df,3)dtval52017-01-01 12:05:00 9462017-01-01 12:06:00 4272017-01-01 12:07:00 97>>>cons_max(df,5)dtval42017-01-01 12:04:00 7052017-01-01 12:05:00 9462017-01-01 12:06:00 4272017-01-01 12:07:00 9782017-01-01 12:08:00 12
This works be effectively "sliding" the kernel (array of ones) across our input and multiply-accumulating the elements in our window of size N
together.
Solution 2:
You could use rolling/sum
and np.nanargmax
to find the index associated with the first occurrence of the maximum value:
import numpy as np
import pandas as pd
df = pd.DataFrame({'dt': ['2017-01-01 12:00:00', '2017-01-01 12:01:00', '2017-01-01 12:02:00', '2017-01-01 12:03:00', '2017-01-01 12:04:00', '2017-01-01 12:05:00', '2017-01-01 12:06:00', '2017-01-01 12:07:00', '2017-01-01 12:08:00', '2017-01-01 12:09:00', '2017-01-01 12:10:00', '2017-01-01 12:11:00', '2017-01-01 12:12:00', '2017-01-01 12:13:00', '2017-01-01 12:14:00', '2017-01-01 12:15:00'], 'val': [33, 42, 44, 6, 70, 94, 42, 97, 12, 11, 66, 71, 25, 23, 39, 25]})
df.index = df.index*10
N = 3
idx = df['val'].rolling(window=N).sum()
i = np.nanargmax(idx) + 1print(df.iloc[i-N : i])
prints
dtval502017-01-01 12:05:00 94602017-01-01 12:06:00 42702017-01-01 12:07:00 97
iloc
uses ordinal indexing. loc
uses label-based indexing. Provided that
both i-N
and i
are valid indices, df.iloc[i-N : i]
will grab a window
(sub-DataFrame) of length N
. In contrast, df.loc[i-N, i]
will only grab a
window of length N
if the index uses consecutive integers. The example above
shows a DataFrame where df.loc
would not work since df.index
has
non-consecutive integer values.
Solution 3:
For simple single values, you can use something like:
df['total'] = df.val + df.val.shift(-1) + df.val.shift(-2)
first = df.dropna().sort('total').index[-1]
df.iloc[first:first+3]
Not sure how to generalize this... with most things pandas, there is probably an easier way, but this does work.
Edit: After a little more work, it looks like rolling is what you want:
last = df.val.rolling(3).sum().dropna().sort_values().index[-1]
This is a slightly different, in that the index you get here is the end, so after doing the above you want to do
df.iloc[last-2:last+1]
I think that could be generalized.
Post a Comment for "Python Pandas - Find Consecutive Group With Max Aggregate Values"