[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #UAD-972803]: netcdf4 HDF chunksize/cache ... vs. 'classic' format.
- Subject: [netCDF #UAD-972803]: netcdf4 HDF chunksize/cache ... vs. 'classic' format.
- Date: Wed, 09 Mar 2016 11:02:03 -0700
Hi Tim,
While I don't have a concrete set of parameters you can use for optimal
performance, I think I can provide a little bit of insight to help guide your
tests towards *increased* performance. One caveat to start is that I'm not
proficient in Fortran, so if I'm reading something in your code incorrectly, my
apologies. I'll also say that I don't think changing the cache size is going
to achieve much, so we can ignore that for now and leave it at the default,
which is appropriate for the data size you're working with.
My first thought is that there is always going to be increased I/O overhead
when comparing netCDF3 I/O to netCDF4 I/O. This overhead comes from different
places; chunking, caching, compression, fill values and the complexity of the
HDF5 library. If we wanted to establish the best-case scenario, *in terms of
I/O speed*, I would suggest running a benchmark *without* chunking or
compression, and with fill values turned off. The results of this benchmark
are going to establish a reasonable baseline for performance; nothing we do
will (probably) be able to beat them. A more realistic benchmark would be to
then run without chunking or compression, but with fill values. This will be
slower, but also safer.
> Fill values take up a lot of I/O time, but they also help guard against data
> corruption. Fill values will let a scientist determine between garbage data
> and 'empty' data. With no fill values, data corruption can become impossible
> to detect. I only recommend not using fill values if you are absolutely sure
> that's what you want to do.
Once we have this baseline, we can throw chunking and/or compression back into
the mix. Chunking will be the dominant factor, I believe, because the
efficiency of compression is dictated by the underlying data, which is in turn
dictated by the chunk size.
Russ Rew wrote an excellent blog series regarding chunking, why it matters, and
how to go about selecting chunk sizes:
*
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters
*
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes
Here's a blog post he also wrote on compression:
* http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression
Armed with this information, and knowing (roughly) the best-case scenario, you
should be able to select chunking/compression parameters which improve the
write speed of your data.
There is an alternative, although I don't know if it's any interest to you.
You could always write the data uncompressed, and then use post-processing (in
the form of nccopy or the NCO tools) to generate compressed files from the
uncompressed files. This solution *only* tackles the issue of initial disk I/O
speed, but perhaps that's the dominant concern.
Finally, you may be able to speed up your experimentation; instead of running a
test program to generate the data, you could use `nccopy` to copy an
uncompressed data file into a compressed, chunked data file. This should go
much faster, and the timings from nccopy may inform your larger avenue of
investigation.
I feel like I've rambled a bit, but I hope this is helpful. If you have any
thoughts, or if you feel that I've missed something, please let me know!
-Ward
> I've been exploring the compression/deflation options for our netCDF files
> produced by DART.
> We are typically concerned with write performance. The variables are
> typically 5D,
> with one unlimited dimension and one dimension that is per 'copy/model
> instance'. The other dimensions are spatial. The variables that are being
> calculated are 3D - one for each ensemble member at each time step. So -
> we're repeatedly stuffing (~20MB) 3D objects into 5D containers. for
> example:
>
> west_east_d01 = 414 ;
> south_north_d01 = 324 ;
> bottom_top_d01 = 39 ;
> copy = 54 ;
> time = UNLIMITED ; // (1 currently)
>
> float QVAPOR_d01(time, copy, bottom_top_d01, south_north_d01,
> west_east_d01) ;
> QVAPOR_d01:units = "kg kg-1" ;
> QVAPOR_d01:description = "Water vapor mixing ratio" ;
> QVAPOR_d01:long_name = "Water vapor mixing ratio" ;
> QVAPOR_d01:coordinates = "XLONG_d01 XLAT_d01" ;
>
> Presently, (make sure you're sitting down), we are using the classic format
> with large file support.
> I've been trying to move to netCDF4/HDF5 with compression.
>
> On yellowstone, I cannot even get close to the wall-clock achieved with the
> classic format.
>
> I have a (really trivial) job that runs the same test 10x.
> With the classic format, it takes less than 3 minutes end-to-end for each
> of the 10 tests.
>
> With the netCDF4/HDF5 format and the default settings, the exact same test
> took more than 40 minutes for each of the tests. OK - clearly the defaults
> (listed below) are not appropriate.
>
> QVAPOR_d01: deflate_level 0
> QVAPOR_d01: contiguous F
> QVAPOR_d01: shuffle F
> QVAPOR_d01: fletcher32 F
> QVAPOR_d01: chunksizes 83 65 8 11
> 1
>
> So I tried specifying (both the deflate level and chunksizes)
>
> chunksizes(1:4) = (/ wrf%dom(id)%var_size(1,ind), &
> wrf%dom(id)%var_size(2,ind), &
> wrf%dom(id)%var_size(3,ind), &
> 1 /)
> deflate_level = 1
> io = nf90_def_var(ncid=ncFileID, name=varname, &
> xtype=nf90_real,dimids=dimids_3D, varid=var_id, &
> chunksizes=chunksizes(1:4), deflate_level=deflate_level)
>
> QVAPOR_d01: deflate_level 1
> QVAPOR_d01: contiguous F
> QVAPOR_d01: shuffle F
> QVAPOR_d01: fletcher32 F
> QVAPOR_d01: chunksizes 414 324 39
> 1 1
> QVAPOR_d01: cache_size 64
> QVAPOR_d01: cache_nelems 1009
> QVAPOR_d01: cache_preemption 75
>
> which knocked it down to 11 or 12 minutes per execution - still 4X slower
> than the classic format.
>
> So - I thought ... 'change the cache size' ... but as soon as I try to
> specify the cache_size argument in the nf90_def_var call, I get a run-time
> error "NetCDF: Invalid argument"
> Besides, the cache size is already 64MB, my objects are about 20MB.
>
> Am I going about this the wrong way? Can you provide any insight or
> suggestions?
> In general, I believe I will need an unlimited dimension, as it is not
> technically possible to
> know exactly how many timesteps will be in the file because that is based
> on the availability of
> observations that are not always available at every regular timestep.
>
> I'd love to sit down with someone to fully explain my write pattern and
> learn ways to improve on it.
>
> Cheers -- Tim
>
> P.S. Currently Loaded Modules:
> 1) ncarenv/1.0 3) intel/12.1.5 5) netcdf/4.3.0
> 2) ncarbinlibs/1.1 4) ncarcompilers/1.0
>
> Tim Hoar
> Data Assimilation Research Section
> Institute for Mathematics Applied to Geosciences
> National Center for Atmospheric Research
> address@hidden
> 303.497.1708
>
>
Ticket Details
===================
Ticket ID: UAD-972803
Department: Support netCDF
Priority: Normal
Status: Closed