Plotting 2D Data Using Xarray Takes A Surprisingly Long Time?
Solution 1:
When you open your dataset and provide the chunks
argument, xarray is returning a Dataset
that is comprised of dask arrays. These arrays are evaluated "lazily" (xarray/dask documentation). It is not until you plot your data that the computation is triggered. To illustrate this, you can explicitly load your data after you take the mean:
flnm=xr.open_mfdataset('./WRF_3D_2007_*.nc',chunks={'Times': 100})
QVAPOR_mean=flnm.QVAPOR.mean(dim=('Times','lev').load()
Now your QVAPOR_mean
variable is backed by a numpy array instead of a dask array. Plotting this array will likely be much faster.
However, the computation of your mean
is likely to still take quite a long time. There are ways improve the throughput here as well.
Try using a larger chunk size. I often find that chunk sizes in the 10-100Mb range perform best.
Try a different scheduler. You are by default using dask's threaded scheduler. Because of limitations with netCDF/HDF, this does not allow for parallel reads from disk. We have been finding that the
distributed
scheduler works well for these applications.
Post a Comment for "Plotting 2D Data Using Xarray Takes A Surprisingly Long Time?"