Skip to content Skip to sidebar Skip to footer

Plotting 2D Data Using Xarray Takes A Surprisingly Long Time?

I am reading NetCDF files using xarray. Each variable have 4 dimensions (Times, lev, y, x). After reading the variable, I calculate the mean of the variable QVAPOR along (Times,lev

Solution 1:

When you open your dataset and provide the chunks argument, xarray is returning a Dataset that is comprised of dask arrays. These arrays are evaluated "lazily" (xarray/dask documentation). It is not until you plot your data that the computation is triggered. To illustrate this, you can explicitly load your data after you take the mean:

flnm=xr.open_mfdataset('./WRF_3D_2007_*.nc',chunks={'Times': 100})
QVAPOR_mean=flnm.QVAPOR.mean(dim=('Times','lev').load()

Now your QVAPOR_mean variable is backed by a numpy array instead of a dask array. Plotting this array will likely be much faster.

However, the computation of your mean is likely to still take quite a long time. There are ways improve the throughput here as well.

  • Try using a larger chunk size. I often find that chunk sizes in the 10-100Mb range perform best.

  • Try a different scheduler. You are by default using dask's threaded scheduler. Because of limitations with netCDF/HDF, this does not allow for parallel reads from disk. We have been finding that the distributed scheduler works well for these applications.


Post a Comment for "Plotting 2D Data Using Xarray Takes A Surprisingly Long Time?"