Create New Pandas Dataframe Column Containing Boolean Output From Searching For Substrings
I'd like to create a new column where if a substring is found in an existing column, it will return True and vice versa. So in this example, I'd like to search for the substring '
Solution 1:
This how to do it.
df["b"] = df["a"].str.contains("abc")
Regarding your error.
It's seems that you have np.nan value in your column a, then the method str.contain will return np.nan for those value, as you try to index with an array containing np.nan value, pandas tell you that is not possible.
Solution 2:
Not the best solution but you can check for null values with pd.isnull()
or convert null values to a string with str()
.
df = pd.DataFrame({'a':['zabc', None, 'abcy', 'defg']})
df['a'].map(lambda x: True if 'abc' in str(x) else False)
or
df['a'].map(lambda x: False if pd.isnull(x) or 'abc' not in x else True)
Reuslt:
0 True
1 False
2 True
3 False
Name: a, dtype: bool
Solution 3:
Your first code is ok, here is the output on my sample.
s = pd.Series(['cat','hat','dog','fog','pet'])
d = pd.DataFrame(s, columns=['test'])
d['b'] = d['test'].map(lambda x: True if 'og' in x else False)
d
Post a Comment for "Create New Pandas Dataframe Column Containing Boolean Output From Searching For Substrings"