Skip to content Skip to sidebar Skip to footer

Adding New Column With Condition

I would need to manage a data frame by adding more columns. My sample of data headers is `Date` `Sentence` 28 Jan who.c 30 Jan house.a 02 Feb eurolet.it I woul

Solution 1:

Can you convert your original and country into dict ?

original= [('a', 'apartment'), ('b', 'bungalow'), ('c', 'church')]
original = {x:y for x,y in original}
country = [('UK', 'United Kingdom'), ('IT', 'Italy'), ('DE', 'Germany'), ('H', 'Holland'), ..., ('F', 'France'), ('S', 'Spain')]
country = {x:y for x,y in country}

Now you can perform the same task as :

df['Tp'] = df['Sentence'].apply(lambda sen : original.get( sen[-1], country.get(sen[-1], 'unknown') ) )

In your code, you need to have the length of elements in conditions to be same as in choices (and by extension original and country)


Solution 2:

First, lets make some dictionaries from your tuples and combine them

country = {k.lower() : v for (k,v) in country}
og = {k : v for (k,v) in original}
country.update(og)

print(country)

{'uk': 'United Kingdom',
 'it': 'Italy',
 'de': 'Germany',
 'h': 'Holland',
 'f': 'France',
 's': 'Spain',
 'a': 'apartment',
 'b': 'bungalow',
 'c': 'church'}

then lets split and get the max element - this allows for any full stops in your text to be ignored, only looking at the final element. finally, we use .map to associate your values.

df['value'] = df["Sentence"].str.split(".", expand=True).stack().reset_index(1).query(
    "level_1 == level_1.max()"
)[0].map(country)

print(df)

     Date    Sentence      value
0  28 Jan       who.c     church
1  30 Jan     house.a  apartment
2  02 Feb  eurolet.it      Italy

Post a Comment for "Adding New Column With Condition"