Python Performance Problems Using Loops With Big Tables
I am using python and multiple libaries like pandas and scipy to prepare data so I can start deeper analysis. For the preparation purpose I am for instance creating new columns wit
Solution 1:
Take away the loop, and apply the functions to the whole series.
ZEIT_ANFANG = tableContent[6]['p_test_ZEIT_ANFANG'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
ZEIT_ENDE = tableContent[6]['p_test_ZEIT_ENDE'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
tableContent[6]['p_test_Duration'] = ZEIT_ENDE - ZEIT_ANFANG
Solution 2:
You can vectorize the conversion of dates by using pd.to_datetime
and avoid using apply
unnecessarily.
tableContent[6]['p_test_Duration'] = (
pd.to_datetime(tableContent[6]['p_test_ZEIT_ENDE']) -
pd.to_datetime(tableContent[6]['p_test_ZEIT_ANFANG'])
)
Also, you were getting the SettingWithCopy
warning because of the chained indexing assingnment
tableContent[6]['p_test_Duration'].iloc[x] = difference
Which you don't have to worry about if you go about it in the way I suggested.
Post a Comment for "Python Performance Problems Using Loops With Big Tables"