Skip to content Skip to sidebar Skip to footer

Automatically Multiprocessing A 'function Apply' On A Dataframe Column

I have a simple dataframe with two columns. +---------+-------+ | subject | score | +---------+-------+ | wow | 0 | +---------+-------+ | cool | 0 | +---------+----

Solution 1:

The instantiation of language.Client every time you call the find_score function is likely a major bottleneck. You don't need to create a new client instance for every use of the function, so try creating it outside the function, before you call it:

# Instantiates a client
language_client = language.Client()

deffind_score (row):
    # Imports the Google Cloud client libraryfrom google.cloud import language


    import re
    pre_text = re.sub('<[^>]*>', '', row)
    text = re.sub(r'[^\w]', ' ', pre_text)

    document = language_client.document_from_text(text)

    # Detects the sentiment of the text
    sentiment = document.analyze_sentiment().sentiment

    print("Sentiment score - %f " % sentiment.score) 

    return sentiment.score

df['score'] = df['subject'].apply(find_score)

If you insist, you can use multiprocessing like this:

from multiprocessing import Pool
# <Define functions and datasets here>
pool = Pool(processes = 8) # or some number of your choice
df['score'] = pool.map(find_score, df['subject'])
pool.terminate()

Post a Comment for "Automatically Multiprocessing A 'function Apply' On A Dataframe Column"