Np.isnan On Arrays Of Dtype "object"
Solution 1:
If you are willing to use the pandas library, a handy function that cover this case is pd.isnull:
pandas.isnull(obj)Detect missing values (NaN in numeric arrays, None/NaN in object arrays)
Here is an example:
$ python
>>>import numpy >>>import pandas>>>array = numpy.asarray(['a', float('nan')], dtype=object)>>>pandas.isnull(array)
array([False, True])
Solution 2:
You could just use a list comp to get the indexes of any nan's which may be faster in this case:
obj_arr = np.array([1, 2, np.nan, "A"], dtype=object)
inds = [i for i,n in enumerate(obj_arr) if str(n) == "nan"]
Or if you want a boolean mask:
mask = [True if str(n) == "nan" else False for n in obj_arr]
Using is np.nan also seems to work without needing to cast to str:
In [29]: obj_arr = np.array([1, 2, np.nan, "A"], dtype=object)
In [30]: [x is np.nan for x in obj_arr]
Out[30]: [False, False, True, False]
For flat and multidimensional arrays you could check the shape:
defmasks(a):
iflen(a.shape) > 1:
return [[x is np.nan for x in sub] for sub in a]
return [x is np.nan for x in a]
If is np.nan can fail maybe check the type then us np.isnan
defmasks(a):
iflen(a.shape) > 1:
return [[isinstance(x, float) and np.isnan(x) for x in sub] for sub in arr]
return [isinstance(x, float) and np.isnan(x) for x in arr]
Interestingly x is np.nan seems to work fine when the data type is object:
In [76]: arr = np.array([np.nan,np.nan,"3"],dtype=object)
In [77]: [x is np.nan for x in arr]
Out[77]: [True, True, False]
In [78]: arr = np.array([np.nan,np.nan,"3"])
In [79]: [x is np.nan for x in arr]
Out[79]: [False, False, False]
depending on the dtype different things happen:
In [90]: arr = np.array([np.nan,np.nan,"3"])
In [91]: arr.dtype
Out[91]: dtype('S32')
In [92]: arr
Out[92]:
array(['nan', 'nan', '3'],
dtype='|S32')
In [93]: [x == "nan" for x in arr]
Out[93]: [True, True, False]
In [94]: arr = np.array([np.nan,np.nan,"3"],dtype=object)
In [95]: arr.dtype
Out[95]: dtype('O')
In [96]: arr
Out[96]: array([nan, nan, '3'], dtype=object)
In [97]: [x == "nan" for x in arr]
Out[97]: [False, False, False]
Obviously the nan's get coerced to numpy.string_'s when you have strings in your array so x == "nan" works in that case, when you pass object the type is float so if you are always using object dtype then the behaviour should be consistent.
Solution 3:
Define a couple of test arrays, small and bigger
In [21]: x=np.array([1,23.3, np.nan, 'str'],dtype=object)
In [22]: xb=np.tile(x,300)
Your function:
In [23]: isnan(x)
Out[23]: array([False, False, True, False], dtype=bool)
The straight forward list comprehension, returning an array
In [24]: np.array([i is np.nan for i in x])
Out[24]: array([False, False, True, False], dtype=bool)
np.frompyfunc has similar vectorizing power to np.vectorize, but for some reason is under utilized (and in my experience faster)
In [25]: defmyisnan(x):
return x is np.nan
In [26]: visnan=np.frompyfunc(myisnan,1,1)
In [27]: visnan(x)
Out[27]: array([False, False, True, False], dtype=object)
Since it returns dtype object, we may want to cast its values:
In [28]: visnan(x).astype(bool)
Out[28]: array([False, False, True, False], dtype=bool)
It can handle multidim arrays nicely:
In [29]: visnan(x.reshape(2,2)).astype(bool)
Out[29]:
array([[False, False],
[ True, False]], dtype=bool)
Now for some timings:
In [30]:timeitisnan(xb)1000 loops,best of 3:1.03msperloopIn [31]:timeitnp.array([iisnp.nanforiinxb])1000 loops,best of 3:393usperloopIn [32]:timeitvisnan(xb).astype(bool)1000 loops,best of 3:382usperloopAn important point with the i is np.nan test - it only applies to scalars. If the array is dtype object, then iteration produces scalars. But for array of dtype float, we get values of numpy.float64. For those the np.isnan(i) is the correct test.
In [61]: [(i is np.nan) for i in np.array([np.nan,np.nan,1.3])]
Out[61]: [False, False, False]
In [62]: [np.isnan(i) for i in np.array([np.nan,np.nan,1.3])]
Out[62]: [True, True, False]
In [63]: [(i is np.nan) for i in np.array([np.nan,np.nan,1.3], dtype=object)]
Out[63]: [True, True, False]
In [64]: [np.isnan(i) for i in np.array([np.nan,np.nan,1.3], dtype=object)]
...
TypeError: Not implemented for this type
Solution 4:
I would use np.vectorize and a custom function that tests for nan elementwise.
So,
def_isnan(x):
ifisinstance(x, type(np.nan)):
return np.isnan(x)
else:
returnFalse
my_isnan = np.vectorize(_isnan)
Then
X = np.array([[1, 2, np.nan, "A"], [np.nan, True, [], ""]], dtype=object)
my_isnan(X)
returns
array([[False, False, True, False],
[ True, False, False, False]], dtype=bool)
Solution 5:
A way to do this without converting to string or leaving the Numpy environment (also very important IMO) is to use the equality definition of np.nan, where
In[1]: x=np.nanIn[2]: x==xOut[2]: FalseThis is true only where x==np.nan. Therefore, for a Numpy array, the element-wise check of
x!=x
returns True for each element where x==np.nan
Post a Comment for "Np.isnan On Arrays Of Dtype "object""