Skip to content Skip to sidebar Skip to footer

Python New Column Based On Nan In Other Columns

I'm quite new to Python and this is my first ever question so please be gentle with me! I have tried out answers to other similar questions but am still quite stuck. I am using Pa

Solution 1:

Use any and pass param axis=1 which tests row-wise this will produce a boolean array which when converted to int will convert all True values to 1 and False values to 0, this will be much faster than calling apply which is going to iterate row-wise and will be very slow:

In [30]:

df['Col_5'] = any(df[df.columns[1:]].notnull(), axis=1).astype(int)
df
Out[30]:
   Col_1 Col_2 Col_3 Col_4  Col_5
0      1   NaN   NaN   NaN      0
1      2     Y   NaN   NaN      1
2      3     Z     C     S      1
3      4   NaN     B     W      1

In [31]:

df = df[['Col_1', 'Col_5']]
df
Out[31]:
   Col_1  Col_5
0      1      0
1      2      1
2      3      1
3      4      1

Here is the output from any:

In [34]:

any(df[df.columns[1:]].notnull(), axis=1)
Out[34]:
array([False,  True,  True,  True], dtype=bool)

Timings

In [35]:

%timeit df[df.columns[1:]].apply(lambda x: all(x.isnull()) , axis=1).astype(int)
%timeit any(df[df.columns[1:]].notnull(), axis=1).astype(int)
100 loops, best of 3: 2.46 ms per loop
1000 loops, best of 3: 1.4 ms per loop

So on your test data for a df this size my method is over 2x faster than the other answer

Update

As you are running pandas version 0.12.0 then you need to call the top level notnull version as that method is not available at df level:

any(pd.notnull(df[df.columns[1:]]), axis=1).astype(int)

I suggest you upgrade as you'll get lots more features and bug fixes.

Solution 2:

using a function:

df['col_5'] =df.apply(lambda x: all(x.isnull()) , axis=1)

for my money is a bit easier to read. Not sure which is quicker.

Post a Comment for "Python New Column Based On Nan In Other Columns"