Skip to content Skip to sidebar Skip to footer

Pandas: Filter Rows Of Dataframe With Operator Chaining

Most operations in pandas can be accomplished with operator chaining (groupby, aggregate, apply, etc), but the only way I've found to filter rows is via normal bracket indexing df_

Solution 1:

I'm not entirely sure what you want, and your last line of code does not help either, but anyway:

"Chained" filtering is done by "chaining" the criteria in the boolean index.

In[96]: dfOut[96]:
   ABCDa1491b4502c5510d1396In[99]: df[(df.A == 1) & (df.D == 6)]Out[99]:
   ABCDd1396

If you want to chain methods, you can add your own mask method and use that one.

In [90]: def mask(df, key, value):
   ....:     return df[df[key] == value]
   ....:

In [92]: pandas.DataFrame.mask = mask

In [93]: df = pandas.DataFrame(np.random.randint(0, 10, (4,4)), index=list('abcd'), columns=list('ABCD'))

In [95]: df.ix['d','A'] = df.ix['a', 'A']

In [96]: df
Out[96]:
   A  B  C  D
a  1491
b  4502
c  5510
d  1396

In [97]: df.mask('A', 1)
Out[97]:
   A  B  C  D
a  1491
d  1396

In [98]: df.mask('A', 1).mask('D', 6)
Out[98]:
   A  B  C  D
d  1396

Solution 2:

Filters can be chained using a Pandas query:

df = pd.DataFrame(np.random.randn(30, 3), columns=['a','b','c'])
df_filtered = df.query('a > 0').query('0 < b < 2')

Filters can also be combined in a single query:

df_filtered = df.query('a > 0 and 0 < b < 2')

Solution 3:

The answer from @lodagro is great. I would extend it by generalizing the mask function as:

def mask(df, f):
  returndf[f(df)]

Then you can do stuff like:

df.mask(lambda x: x[0] < 0).mask(lambda x: x[1] > 0)

Solution 4:

Since version 0.18.1 the .loc method accepts a callable for selection. Together with lambda functions you can create very flexible chainable filters:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.loc[lambda df: df.A == 80]  # equivalent to df[df.A == 80] but chainable

df.sort_values('A').loc[lambda df: df.A > 80].loc[lambda df: df.B > df.A]

If all you're doing is filtering, you can also omit the .loc.

Solution 5:

pandas provides two alternatives to Wouter Overmeire's answer which do not require any overriding. One is .loc[.] with a callable, as in

df_filtered = df.loc[lambda x: x['column'] == value]

the other is .pipe(), as in

df_filtered = df.pipe(lambda x: x.loc[x['column'] == value])

Post a Comment for "Pandas: Filter Rows Of Dataframe With Operator Chaining"