Applying A Function Along An Axis Of A Dask Array
Solution 1:
I suspect that your experience will be smoother if your function returns an array of the same dimension that it consumes. E.g. you might consider defining your function as follows:
defmy_polyfit(data):
return np.polyfit(data.squeeze(), ...)[:, None, None, None]
Then you can probably ignore the new_axis
, drop_axis
bits.
Performance-wise you might also want to consider using a larger chunksize. At 6000 numbers per chunk you have over a million chunks, which means you'll probably spend more time in scheduling than in actual computation. Generally I shoot for chunks that are a few megabytes in size. Of course, increasing chunksize would cause your mapped function to become more complex.
Example
In [1]: import dask.array as da
In [2]: import numpy as np
In [3]: deff(b):
return np.polyfit(b.squeeze(), np.arange(5), 3)[:, None, None, None]
...:
In [4]: x = da.random.random((5, 3, 3, 3), chunks=(5, 1, 1, 1))
In [5]: x.map_blocks(f, chunks=(4, 1, 1, 1)).compute()
Out[5]:
array([[[[ -1.29058580e+02, 2.21410738e+02, 1.00721521e+01],
[ -2.22469851e+02, -9.14889627e+01, -2.86405832e+02],
[ 1.40415805e+02, 3.58726232e+02, 6.47166710e+02]],
...
Solution 2:
Kind of late to the party, but figured this could use an alternative answer based on new features in Dask. In particular, we added apply_along_axis
, which behaves basically like NumPy's apply_along_axis
except for Dask Arrays instead. This results in somewhat simpler syntax. Also it avoids the need to rechunk your data before applying your custom function to each 1-D piece and makes no real requirements of your initial chunking, which it tries to preserve in the end result (excepting the axis that is either reduced or replaced).
In [1]: import dask.array as da
In [2]: import numpy as np
In [3]: deff(b):
...: return np.polyfit(b, np.arange(len(b)), 3)
...:
In [4]: x = da.random.random((5, 3, 3, 3), chunks=(5, 1, 1, 1))
In [5]: da.apply_along_axis(f, 0, x).compute()
Out[5]:
array([[[[ 2.13570599e+02, 2.28924503e+00, 6.16369231e+01],
[ 4.32000311e+00, 7.01462518e+01, -1.62215514e+02],
[ 2.89466687e+02, -1.35522215e+02, 2.86643721e+02]],
...
Post a Comment for "Applying A Function Along An Axis Of A Dask Array"