[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump
- Subject: [netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump
- Date: Tue, 04 Jan 2011 06:39:58 -0700
James,
Sorry it's taken so long to respond to your question about netCDF-4 file size.
The problem is revealed by running "ncdump -s -h" on the netCDF-4 files, which
shows
the the variables that use the unlimited dimension nsets get "chunked" into 3-D
tiles, and
the netCDF-4 library chooses default chunk sizes that cause the file expansion
you see:
One simple solution would be to write the netCDF-4 data with the unlimited
dimension
nsets changed instead to a fixed size dimension, an operation supported by the
"-u"
option of the nccopy utility. Then the variable data would all be stored
contiguously
instead of chunked, as is required when a variable uses the unlimited dimension.
Another possibility would be to explicitly set the chunksizes for the output to
better
values than determined by the current library algorithm for selecting default
chunk
sizes. We're discussing whether we could fix the default chunk size algorithm
to avoid
extreme file size expansion, such as you have demonstrated in this case.
For example, the library currently sets the default chunksizes for the
measurements
variable as this output from netcdf -h -s shows:
float measurements(nsets, n_variables, npoints) ;
measurements:_Storage = "chunked" ;
measurements:_ChunkSizes = 1, 9, 120669 ;
resulting in 20 chunks, each of size 1*9*120669*4 = 4344084 bytes, for a total
of
86881680 bytes, about 87 Mbytes.
Better choices of chunksizes would be (1, 11, 152750) with 5 chunks or
(1, 1, 152750) with 55 chunks or or (1, 11, 76375) with 110 chunks for
example, none of which would waste any space in the chunks and which
would all result in total storage of 33605000, about 34 Mbytes.
It looks like the current default chunking can result in a large amount
of wasted space in cases like this.
Thanks for pointing out this problem. In summary, to work around it currently
you either have to avoid using the unlimited dimension for these netCDF-4 files
or you have to explicitly set the chunk sizes using the appropriate API call to
not waste as much space as for the current choice of default chunk sizes.
I'm currently working on making it easy to specify chunksizes in the output of
nccopy, but I don't know whether that will make the upcoming 4.1.2 release. If
not, it will be available separately in subsequent snapshot releases and should
help deal with problems like this, if we don't find a better algorithm for
selecting
default chunksizes.
--Russ
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: AIQ-275071
Department: Support netCDF
Priority: Normal
Status: Closed