[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #UAD-972803]: netcdf4 HDF chunksize/cache ... vs. 'classic' format.

Subject: [netCDF #UAD-972803]: netcdf4 HDF chunksize/cache ... vs. 'classic' format.
Date: Wed, 09 Mar 2016 11:02:03 -0700
Hi Tim,

While I don't have a concrete set of parameters you can use for optimal 
performance, I think I can provide a little bit of insight to help guide your 
tests towards *increased* performance.  One caveat to start is that I'm not 
proficient in Fortran, so if I'm reading something in your code incorrectly, my 
apologies.  I'll also say that I don't think changing the cache size is going 
to achieve much, so we can ignore that for now and leave it at the default, 
which is appropriate for the data size you're working with.

My first thought is that there is always going to be increased I/O overhead 
when comparing netCDF3 I/O to netCDF4 I/O.  This overhead comes from different 
places; chunking, caching, compression, fill values and the complexity of the 
HDF5 library.  If we wanted to establish the best-case scenario, *in terms of 
I/O speed*, I would suggest running a benchmark *without* chunking or 
compression, and with fill values turned off.  The results of this benchmark 
are going to establish a reasonable baseline for performance; nothing we do 
will (probably) be able to beat them.  A more realistic benchmark would be to 
then run without chunking or compression, but with fill values.  This will be 
slower, but also safer.  

> Fill values take up a lot of I/O time, but they also help guard against data 
> corruption.  Fill values will let a scientist determine between garbage data 
> and 'empty' data.  With no fill values, data corruption can become impossible 
> to detect.  I only recommend not using fill values if you are absolutely sure 
> that's what you want to do.

Once we have this baseline, we can throw chunking and/or compression back into 
the mix.  Chunking will be the dominant factor, I believe, because the 
efficiency of compression is dictated by the underlying data, which is in turn 
dictated by the chunk size.  

Russ Rew wrote an excellent blog series regarding chunking, why it matters, and 
how to go about selecting chunk sizes:

* 
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters
* 
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes

Here's a blog post he also wrote on compression:

* http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression

Armed with this information, and knowing (roughly) the best-case scenario, you 
should be able to select chunking/compression parameters which improve the 
write speed of your data.

There is an alternative, although I don't know if it's any interest to you.  
You could always write the data uncompressed, and then use post-processing (in 
the form of nccopy or the NCO tools) to generate compressed files from the 
uncompressed files.  This solution *only* tackles the issue of initial disk I/O 
speed, but perhaps that's the dominant concern.  

Finally, you may be able to speed up your experimentation; instead of running a 
test program to generate the data, you could use `nccopy` to copy an 
uncompressed data file into a compressed, chunked data file.  This should go 
much faster, and the timings from nccopy may inform your larger avenue of 
investigation. 

I feel like I've rambled a bit, but I hope this is helpful.  If you have any 
thoughts, or if you feel that I've missed something, please let me know!

-Ward


> I've been exploring the compression/deflation options for our netCDF files
> produced by DART.
> We are typically concerned with write performance. The variables are
> typically 5D,
> with one unlimited dimension and one dimension that is per 'copy/model
> instance'. The other dimensions are spatial. The variables that are being
> calculated are 3D - one for each ensemble member at each time step.  So -
> we're repeatedly stuffing (~20MB) 3D objects into 5D containers. for
> example:
> 
> west_east_d01 = 414 ;
> south_north_d01 = 324 ;
> bottom_top_d01 = 39 ;
> copy = 54 ;
> time = UNLIMITED ; // (1 currently)
> 
> float QVAPOR_d01(time, copy, bottom_top_d01, south_north_d01,
> west_east_d01) ;
> QVAPOR_d01:units = "kg kg-1" ;
> QVAPOR_d01:description = "Water vapor mixing ratio" ;
> QVAPOR_d01:long_name = "Water vapor mixing ratio" ;
> QVAPOR_d01:coordinates = "XLONG_d01 XLAT_d01" ;
> 
> Presently, (make sure you're sitting down), we are using the classic format
> with large file support.
> I've been trying to move to netCDF4/HDF5 with compression.
> 
> On yellowstone, I cannot even get close to the wall-clock achieved with the
> classic format.
> 
> I have a (really trivial) job that runs the same test 10x.
> With the classic format, it takes less than 3 minutes end-to-end for each
> of the 10 tests.
> 
> With the netCDF4/HDF5 format and the default settings, the exact same test
> took more than 40 minutes for each of the tests. OK - clearly the defaults
> (listed below) are not appropriate.
> 
> QVAPOR_d01: deflate_level            0
> QVAPOR_d01:    contiguous  F
> QVAPOR_d01:       shuffle  F
> QVAPOR_d01:    fletcher32  F
> QVAPOR_d01:    chunksizes           83          65           8          11
> 1
> 
> So I tried specifying (both the deflate level and chunksizes)
> 
> chunksizes(1:4) = (/ wrf%dom(id)%var_size(1,ind), &
> wrf%dom(id)%var_size(2,ind), &
> wrf%dom(id)%var_size(3,ind), &
> 1 /)
> deflate_level = 1
> io = nf90_def_var(ncid=ncFileID, name=varname,  &
> xtype=nf90_real,dimids=dimids_3D, varid=var_id, &
> chunksizes=chunksizes(1:4), deflate_level=deflate_level)
> 
> QVAPOR_d01:    deflate_level            1
> QVAPOR_d01:       contiguous  F
> QVAPOR_d01:          shuffle  F
> QVAPOR_d01:       fletcher32  F
> QVAPOR_d01:       chunksizes          414         324          39
> 1           1
> QVAPOR_d01:       cache_size           64
> QVAPOR_d01:     cache_nelems         1009
> QVAPOR_d01: cache_preemption           75
> 
> which knocked it down to 11 or 12 minutes per execution - still 4X slower
> than the classic format.
> 
> So - I thought ... 'change the cache size' ... but as soon as I try to
> specify the cache_size argument in the nf90_def_var call, I get a run-time
> error  "NetCDF: Invalid argument"
> Besides, the cache size is already 64MB, my objects are about 20MB.
> 
> Am I going about this the wrong way? Can you provide any insight or
> suggestions?
> In general, I believe I will need an unlimited dimension, as it is not
> technically possible to
> know exactly how many timesteps will be in the file because that is based
> on the availability of
> observations that are not always available at every regular timestep.
> 
> I'd love to sit down with someone to fully explain my write pattern and
> learn ways to improve on it.
> 
> Cheers -- Tim
> 
> P.S. Currently Loaded Modules:
> 1) ncarenv/1.0        3) intel/12.1.5         5) netcdf/4.3.0
> 2) ncarbinlibs/1.1    4) ncarcompilers/1.0
> 
> Tim Hoar
> Data Assimilation Research Section
> Institute for Mathematics Applied to Geosciences
> National Center for Atmospheric Research
> address@hidden
> 303.497.1708
> 
> 

Ticket Details
===================
Ticket ID: UAD-972803
Department: Support netCDF
Priority: Normal
Status: Closed
Prev by Date: [netCDF #IAO-533024]: Compressed files return error in java netCDF
Next by Date: [netCDF #JDC-680641]: Test problem with NetCDF.
Previous by thread: [netCDF #TOM-306249]: Byte stream dump of netcdf file
Next by thread: [netCDF #JDC-680641]: Test problem with NetCDF.
Index(es):
- Date
- Thread