Assignment Of Pandas DataFrame With Float32 And Float64 Slow
Assignments with a Pandas DataFrame with varying float32 and float64 datatypes are for some combinations rather slow the way I do it. The code below sets up a DataFrame, makes a Nu
Solution 1:
Single-column assignments does not change type and iterating with a for-loop over columns seems reasonably fast for non-type-casting assignments, - both float32 and float64. For assignments involving type casting the performance is usually twice as bad as the worst performance for multiple column assignment
import pandas as pd
import numpy as np
from scipy.signal import lfilter
N = 1000
M = 1000
def f(dtype1, dtype2):
coi = [str(m) for m in range(M)]
df = pd.DataFrame([[m for m in range(M)] + ['Hello', 'World'] for n in range(N)],
columns=coi + ['A', 'B'], dtype=dtype1)
Y = lfilter([1], [0.5, 0.5], df.ix[:, coi])
Y = Y.astype(dtype2)
new = df.copy()
print(new.iloc[0, 0].dtype)
print(Y.dtype)
for n, column in enumerate(coi): # For-loop over columns new!
new.ix[:, column] = Y[:, n]
print(new.iloc[0, 0].dtype)
from time import time
dtypes = [np.float32, np.float64]
for dtype1 in dtypes:
for dtype2 in dtypes:
print('-' * 10)
start_time = time()
f(dtype1, dtype2)
print(time() - start_time)
The result is:
----------
float32
float32
float32
0.809890985489
----------
float32
float64
float64
21.4767119884
----------
float64
float32
float32
20.5611870289
----------
float64
float64
float64
0.765362977982
Post a Comment for "Assignment Of Pandas DataFrame With Float32 And Float64 Slow"