Np.isnan On Arrays Of Dtype "object"

February 28, 2024 Post a Comment

I'm working with numpy arrays of different data types. I would like to know, of any particular array, which elements are NaN. Normally, this is what np.isnan is for. However, np.is

Solution 1:

If you are willing to use the pandas library, a handy function that cover this case is pd.isnull:

pandas.isnull(obj)
Detect missing values (NaN in numeric arrays, None/NaN in object arrays)

Here is an example:

$ python
>>>import numpy   >>>import pandas>>>array = numpy.asarray(['a', float('nan')], dtype=object)>>>pandas.isnull(array)
array([False,  True])

Solution 2:

You could just use a list comp to get the indexes of any nan's which may be faster in this case:

obj_arr = np.array([1, 2, np.nan, "A"], dtype=object)

inds = [i for i,n in enumerate(obj_arr) if str(n) == "nan"]

Or if you want a boolean mask:

mask = [True if str(n) == "nan" else False for n in obj_arr]

Using is np.nan also seems to work without needing to cast to str:

In [29]: obj_arr = np.array([1, 2, np.nan, "A"], dtype=object)

In [30]: [x is np.nan for x in obj_arr]
Out[30]: [False, False, True, False]

For flat and multidimensional arrays you could check the shape:

defmasks(a):
    iflen(a.shape) > 1:
        return [[x is np.nan for x in sub] for sub in a]
    return [x is np.nan for x in a]

If is np.nan can fail maybe check the type then us np.isnan

defmasks(a):
    iflen(a.shape) > 1:
        return [[isinstance(x, float) and np.isnan(x) for x in sub] for sub in arr]
    return [isinstance(x, float) and np.isnan(x)  for x in arr]

Interestingly x is np.nan seems to work fine when the data type is object:

In [76]: arr = np.array([np.nan,np.nan,"3"],dtype=object)

In [77]: [x is np.nan  for x in arr]
Out[77]: [True, True, False]

In [78]: arr = np.array([np.nan,np.nan,"3"])

In [79]: [x is np.nan  for x in arr]
Out[79]: [False, False, False]

depending on the dtype different things happen:

In [90]: arr = np.array([np.nan,np.nan,"3"])

In [91]: arr.dtype
Out[91]: dtype('S32')

In [92]: arr
Out[92]: 
array(['nan', 'nan', '3'], 
      dtype='|S32')

In [93]: [x == "nan"  for x in arr]
Out[93]: [True, True, False]

In [94]: arr = np.array([np.nan,np.nan,"3"],dtype=object)

In [95]: arr.dtype
Out[95]: dtype('O')

In [96]: arr
Out[96]: array([nan, nan, '3'], dtype=object)

In [97]: [x == "nan"  for x in arr]
Out[97]: [False, False, False]

Obviously the nan's get coerced to numpy.string_'s when you have strings in your array so x == "nan" works in that case, when you pass object the type is float so if you are always using object dtype then the behaviour should be consistent.

Solution 3:

Define a couple of test arrays, small and bigger

In [21]: x=np.array([1,23.3, np.nan, 'str'],dtype=object)
In [22]: xb=np.tile(x,300)

Your function:

In [23]: isnan(x)
Out[23]: array([False, False,  True, False], dtype=bool)

The straight forward list comprehension, returning an array

Baca Juga

In [24]: np.array([i is np.nan for i in x])
Out[24]: array([False, False,  True, False], dtype=bool)

np.frompyfunc has similar vectorizing power to np.vectorize, but for some reason is under utilized (and in my experience faster)

In [25]: defmyisnan(x):
        return x is np.nan
In [26]: visnan=np.frompyfunc(myisnan,1,1)

In [27]: visnan(x)
Out[27]: array([False, False, True, False], dtype=object)

Since it returns dtype object, we may want to cast its values:

In [28]: visnan(x).astype(bool)
Out[28]: array([False, False,  True, False], dtype=bool)

It can handle multidim arrays nicely:

In [29]: visnan(x.reshape(2,2)).astype(bool)
Out[29]: 
array([[False, False],
       [ True, False]], dtype=bool)

Now for some timings:

In [30]:timeitisnan(xb)1000 loops,best of 3:1.03msperloopIn [31]:timeitnp.array([iisnp.nanforiinxb])1000 loops,best of 3:393usperloopIn [32]:timeitvisnan(xb).astype(bool)1000 loops,best of 3:382usperloop

An important point with the i is np.nan test - it only applies to scalars. If the array is dtype object, then iteration produces scalars. But for array of dtype float, we get values of numpy.float64. For those the np.isnan(i) is the correct test.

In [61]: [(i is np.nan) for i in np.array([np.nan,np.nan,1.3])]
Out[61]: [False, False, False]

In [62]: [np.isnan(i) for i in np.array([np.nan,np.nan,1.3])]
Out[62]: [True, True, False]

In [63]: [(i is np.nan) for i in np.array([np.nan,np.nan,1.3], dtype=object)]
Out[63]: [True, True, False]

In [64]: [np.isnan(i) for i in np.array([np.nan,np.nan,1.3],  dtype=object)]
...
TypeError: Not implemented for this type

Solution 4:

I would use np.vectorize and a custom function that tests for nan elementwise. So,

def_isnan(x):
    ifisinstance(x, type(np.nan)):
        return np.isnan(x)
    else:
        returnFalse

my_isnan = np.vectorize(_isnan)

Then

X = np.array([[1, 2, np.nan, "A"], [np.nan, True, [], ""]], dtype=object)
my_isnan(X)

returns

array([[False, False,  True, False],
        [ True, False, False, False]], dtype=bool)

Solution 5:

A way to do this without converting to string or leaving the Numpy environment (also very important IMO) is to use the equality definition of np.nan, where

In[1]: x=np.nanIn[2]: x==xOut[2]: False

This is true only where x==np.nan. Therefore, for a Numpy array, the element-wise check of

x!=x

returns True for each element where x==np.nan

Introduction to Python Course