Np.isnan On Arrays Of Dtype "object"
Solution 1:
If you are willing to use the pandas library, a handy function that cover this case is pd.isnull:
pandas.isnull(obj)
Detect missing values (NaN in numeric arrays, None/NaN in object arrays)
Here is an example:
$ python
>>>import numpy >>>import pandas>>>array = numpy.asarray(['a', float('nan')], dtype=object)>>>pandas.isnull(array)
array([False, True])
Solution 2:
You could just use a list comp to get the indexes of any nan's which may be faster in this case:
obj_arr = np.array([1, 2, np.nan, "A"], dtype=object)
inds = [i for i,n in enumerate(obj_arr) if str(n) == "nan"]
Or if you want a boolean mask:
mask = [True if str(n) == "nan" else False for n in obj_arr]
Using is np.nan
also seems to work without needing to cast to str:
In [29]: obj_arr = np.array([1, 2, np.nan, "A"], dtype=object)
In [30]: [x is np.nan for x in obj_arr]
Out[30]: [False, False, True, False]
For flat and multidimensional arrays you could check the shape:
defmasks(a):
iflen(a.shape) > 1:
return [[x is np.nan for x in sub] for sub in a]
return [x is np.nan for x in a]
If is np.nan can fail maybe check the type then us np.isnan
defmasks(a):
iflen(a.shape) > 1:
return [[isinstance(x, float) and np.isnan(x) for x in sub] for sub in arr]
return [isinstance(x, float) and np.isnan(x) for x in arr]
Interestingly x is np.nan
seems to work fine when the data type is object:
In [76]: arr = np.array([np.nan,np.nan,"3"],dtype=object)
In [77]: [x is np.nan for x in arr]
Out[77]: [True, True, False]
In [78]: arr = np.array([np.nan,np.nan,"3"])
In [79]: [x is np.nan for x in arr]
Out[79]: [False, False, False]
depending on the dtype different things happen:
In [90]: arr = np.array([np.nan,np.nan,"3"])
In [91]: arr.dtype
Out[91]: dtype('S32')
In [92]: arr
Out[92]:
array(['nan', 'nan', '3'],
dtype='|S32')
In [93]: [x == "nan" for x in arr]
Out[93]: [True, True, False]
In [94]: arr = np.array([np.nan,np.nan,"3"],dtype=object)
In [95]: arr.dtype
Out[95]: dtype('O')
In [96]: arr
Out[96]: array([nan, nan, '3'], dtype=object)
In [97]: [x == "nan" for x in arr]
Out[97]: [False, False, False]
Obviously the nan's get coerced to numpy.string_'s
when you have strings in your array so x == "nan"
works in that case, when you pass object the type is float so if you are always using object dtype then the behaviour should be consistent.
Solution 3:
Define a couple of test arrays, small and bigger
In [21]: x=np.array([1,23.3, np.nan, 'str'],dtype=object)
In [22]: xb=np.tile(x,300)
Your function:
In [23]: isnan(x)
Out[23]: array([False, False, True, False], dtype=bool)
The straight forward list comprehension, returning an array
In [24]: np.array([i is np.nan for i in x])
Out[24]: array([False, False, True, False], dtype=bool)
np.frompyfunc
has similar vectorizing power to np.vectorize
, but for some reason is under utilized (and in my experience faster)
In [25]: defmyisnan(x):
return x is np.nan
In [26]: visnan=np.frompyfunc(myisnan,1,1)
In [27]: visnan(x)
Out[27]: array([False, False, True, False], dtype=object)
Since it returns dtype object, we may want to cast its values:
In [28]: visnan(x).astype(bool)
Out[28]: array([False, False, True, False], dtype=bool)
It can handle multidim arrays nicely:
In [29]: visnan(x.reshape(2,2)).astype(bool)
Out[29]:
array([[False, False],
[ True, False]], dtype=bool)
Now for some timings:
In [30]:timeitisnan(xb)1000 loops,best of 3:1.03msperloopIn [31]:timeitnp.array([iisnp.nanforiinxb])1000 loops,best of 3:393usperloopIn [32]:timeitvisnan(xb).astype(bool)1000 loops,best of 3:382usperloop
An important point with the i is np.nan
test - it only applies to scalars. If the array is dtype object, then iteration produces scalars. But for array of dtype float, we get values of numpy.float64
. For those the np.isnan(i)
is the correct test.
In [61]: [(i is np.nan) for i in np.array([np.nan,np.nan,1.3])]
Out[61]: [False, False, False]
In [62]: [np.isnan(i) for i in np.array([np.nan,np.nan,1.3])]
Out[62]: [True, True, False]
In [63]: [(i is np.nan) for i in np.array([np.nan,np.nan,1.3], dtype=object)]
Out[63]: [True, True, False]
In [64]: [np.isnan(i) for i in np.array([np.nan,np.nan,1.3], dtype=object)]
...
TypeError: Not implemented for this type
Solution 4:
I would use np.vectorize
and a custom function that tests for nan elementwise.
So,
def_isnan(x):
ifisinstance(x, type(np.nan)):
return np.isnan(x)
else:
returnFalse
my_isnan = np.vectorize(_isnan)
Then
X = np.array([[1, 2, np.nan, "A"], [np.nan, True, [], ""]], dtype=object)
my_isnan(X)
returns
array([[False, False, True, False],
[ True, False, False, False]], dtype=bool)
Solution 5:
A way to do this without converting to string or leaving the Numpy environment (also very important IMO) is to use the equality definition of np.nan, where
In[1]: x=np.nanIn[2]: x==xOut[2]: False
This is true only where x==np.nan. Therefore, for a Numpy array, the element-wise check of
x!=x
returns True
for each element where x==np.nan
Post a Comment for "Np.isnan On Arrays Of Dtype "object""