How To Partial Load An Array Saved With Numpy Save In Python
Solution 1:
You'd have to intentionally save the array for partial loading; you can't do generically.
You could, for example, split the array (along one of the dimensions) and save the subarrays with savez
. load
of a such a file archive is 'lazy', only reading the subfiles you ask for.
h5py
is an add on package that saves and loads data from HDF5 files. That allows for partial reads.
numpy.memmap
is another option, treating a file as memory that stores an array.
Look up the docs for these, as well as previous SO questions.
How can I efficiently read and write files that are too large to fit in memory?
Fastest save and load options for a numpy array
Writing a large hdf5 dataset using h5py
To elaborate on the holds. There are minor points that aren't clear. What exactly do you mean by 'load some dimension'? The simplest interpretation is that you want A[0,...]
or A[3:10,...]
. The other is the implication of 'simple way'. Does that mean you already have a complex way, and what a simpler one? Or just that you don't want to rewrite the numpy.load
function to do the task?
Otherwise I think the question is reasonably clear - and the simple answer is - no there isn't a simple way.
I'm tempted to reopen the question so other experienced numpy
posters can weigh in.
I should have reviewed the load
docs (the OP should have as well!). As ali_m
commented there is a memory map mode. The docs say:
mmap_mode : {None, 'r+', 'r', 'w+', 'c'}, optional
If notNone, then memory-map the file, using the given mode (see `numpy.memmap` for a detailed description of the modes). A memory-mapped arrayis kept on disk. However, it can be accessed and sliced likeany ndarray. Memory mapping is especially useful for accessing small fragments oflarge files without reading the entire file into memory.
How does numpy handle mmap's over npz files? (I dug into this months ago, but forgot the option.)
Post a Comment for "How To Partial Load An Array Saved With Numpy Save In Python"