Python New Column Based On Nan In Other Columns
I'm quite new to Python and this is my first ever question so please be gentle with me! I have tried out answers to other similar questions but am still quite stuck. I am using Pa
Solution 1:
Use any
and pass param axis=1
which tests row-wise this will produce a boolean array which when converted to int will convert all True
values to 1
and False
values to 0
, this will be much faster than calling apply
which is going to iterate row-wise and will be very slow:
In [30]:
df['Col_5'] = any(df[df.columns[1:]].notnull(), axis=1).astype(int)
df
Out[30]:
Col_1 Col_2 Col_3 Col_4 Col_5
0 1 NaN NaN NaN 0
1 2 Y NaN NaN 1
2 3 Z C S 1
3 4 NaN B W 1
In [31]:
df = df[['Col_1', 'Col_5']]
df
Out[31]:
Col_1 Col_5
0 1 0
1 2 1
2 3 1
3 4 1
Here is the output from any
:
In [34]:
any(df[df.columns[1:]].notnull(), axis=1)
Out[34]:
array([False, True, True, True], dtype=bool)
Timings
In [35]:
%timeit df[df.columns[1:]].apply(lambda x: all(x.isnull()) , axis=1).astype(int)
%timeit any(df[df.columns[1:]].notnull(), axis=1).astype(int)
100 loops, best of 3: 2.46 ms per loop
1000 loops, best of 3: 1.4 ms per loop
So on your test data for a df this size my method is over 2x faster than the other answer
Update
As you are running pandas version 0.12.0
then you need to call the top level notnull
version as that method is not available at df level:
any(pd.notnull(df[df.columns[1:]]), axis=1).astype(int)
I suggest you upgrade as you'll get lots more features and bug fixes.
Solution 2:
using a function:
df['col_5'] =df.apply(lambda x: all(x.isnull()) , axis=1)
for my money is a bit easier to read. Not sure which is quicker.
Post a Comment for "Python New Column Based On Nan In Other Columns"