Skip to content Skip to sidebar Skip to footer

Python Performance Problems Using Loops With Big Tables

I am using python and multiple libaries like pandas and scipy to prepare data so I can start deeper analysis. For the preparation purpose I am for instance creating new columns wit

Solution 1:

Take away the loop, and apply the functions to the whole series.

ZEIT_ANFANG = tableContent[6]['p_test_ZEIT_ANFANG'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
ZEIT_ENDE = tableContent[6]['p_test_ZEIT_ENDE'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
tableContent[6]['p_test_Duration'] = ZEIT_ENDE - ZEIT_ANFANG

Solution 2:

You can vectorize the conversion of dates by using pd.to_datetime and avoid using apply unnecessarily.

tableContent[6]['p_test_Duration'] = (
    pd.to_datetime(tableContent[6]['p_test_ZEIT_ENDE']) -
    pd.to_datetime(tableContent[6]['p_test_ZEIT_ANFANG'])
)

Also, you were getting the SettingWithCopy warning because of the chained indexing assingnment

tableContent[6]['p_test_Duration'].iloc[x] = difference

Which you don't have to worry about if you go about it in the way I suggested.


Post a Comment for "Python Performance Problems Using Loops With Big Tables"