Skip to content Skip to sidebar Skip to footer

Python - Pickup Data Based On A List Of Values Or Conditions

I have a data set that has 9 columns, and I managed to extract two of the columns using pandas (Thank you Stack members for your help before!). Now, my question is: I have a list o

Solution 1:

One solution would be to make a new DataFrame with the values from the pickuplist as index, and from plist as columns

matches = pd.DataFrame(index=pickup['mass'], columns = plist.set_index(list(plist.columns)).index, dtype=bool)

Then populate this DataFrame as needed if for example you can only be 150ppm from the target than you can use abs to make this two-sided comparison

ppm = 150forindex, exp_mass, intensity in plist.itertuples():
    matches[exp_mass] = abs(matches.index - exp_mass) / matches.index < ppm / 1e6

This gives something like this

Exp.m/z1000    2000    3000    4000Intensity2000    3000    4000    5000mass1000    TrueFalseFalseFalse1200    FalseFalseFalseFalse1300    FalseFalseFalseFalse

Which you can easily condense with a dict comprehension

results = {i: list(s.index[s]) for i, s in matches.iterrows()}

Which returns a dict entry for each row in pickuplist all the matches in plist in the form of a tuple (Exp. m/z, Intensity) like this

{1000: [(1000, 2000)], 1200: [], 1300: []}

If you only want the (Exp. m/z, Intensity) tuples, you can do this

results2 = {key for key, value in matches.any().iteritems() if value}

this give this set

{(1000, 2000)}

Solution 2:

If you have more than one condition while indexing a dataframe, all the conditions have be within another bracket together.

plistcollect[(plistcollect['Exp. m/z']>peak1lower) & (plistcollect['Exp. m/z'] < peak1upper)]

should be

plistcollect[((plistcollect['Exp. m/z']>peak1lower) & (plistcollect['Exp. m/z'] < peak1upper))]

Edit: Since you need to perform it on every element, you have to do something like this:

limit_df = pd.DataFrame([peak1lower['Exp. m/z'],peak1upper['Exp. m/z']], index=['lower','upper']).T
filtered_df = limit_df.apply(lambda x: ((plistcollect['Exp. m/z'] > x.lower) & (plistcollect['Exp. m/z'] < x.upper)), axis=1)

filtered_df will give you a boolean DataFrame, each row will have True, False corresponding to DataFrame entry that fall within the given element of mass list.

The simpler way can be to save the file individually:

def filter_df(x):
    plistcollect[((plistcollect['Exp. m/z'] > x.lower) & (plistcollect['Exp. m/z'] < x.upper))].to_csv("test_%s.csv"%x.name)

limit_df.apply(lambda x: filter_df(x), axis=1)

Post a Comment for "Python - Pickup Data Based On A List Of Values Or Conditions"