Plotting 2D Data Using Xarray Takes A Surprisingly Long Time?

February 01, 2023 Post a Comment

I am reading NetCDF files using xarray. Each variable have 4 dimensions (Times, lev, y, x). After reading the variable, I calculate the mean of the variable QVAPOR along (Times,lev

Solution 1:

When you open your dataset and provide the chunks argument, xarray is returning a Dataset that is comprised of dask arrays. These arrays are evaluated "lazily" (xarray/dask documentation). It is not until you plot your data that the computation is triggered. To illustrate this, you can explicitly load your data after you take the mean:

flnm=xr.open_mfdataset('./WRF_3D_2007_*.nc',chunks={'Times': 100})
QVAPOR_mean=flnm.QVAPOR.mean(dim=('Times','lev').load()

Now your QVAPOR_mean variable is backed by a numpy array instead of a dask array. Plotting this array will likely be much faster.

However, the computation of your mean is likely to still take quite a long time. There are ways improve the throughput here as well.

Baca Juga

Try using a larger chunk size. I often find that chunk sizes in the 10-100Mb range perform best.
Try a different scheduler. You are by default using dask's threaded scheduler. Because of limitations with netCDF/HDF, this does not allow for parallel reads from disk. We have been finding that the distributed scheduler works well for these applications.

Introduction to Python Course

Plotting 2D Data Using Xarray Takes A Surprisingly Long Time?

Solution 1:

Post a Comment for "Plotting 2D Data Using Xarray Takes A Surprisingly Long Time?"