This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
James, Sorry it's taken so long to respond to your question about netCDF-4 file size. The problem is revealed by running "ncdump -s -h" on the netCDF-4 files, which shows the the variables that use the unlimited dimension nsets get "chunked" into 3-D tiles, and the netCDF-4 library chooses default chunk sizes that cause the file expansion you see: One simple solution would be to write the netCDF-4 data with the unlimited dimension nsets changed instead to a fixed size dimension, an operation supported by the "-u" option of the nccopy utility. Then the variable data would all be stored contiguously instead of chunked, as is required when a variable uses the unlimited dimension. Another possibility would be to explicitly set the chunksizes for the output to better values than determined by the current library algorithm for selecting default chunk sizes. We're discussing whether we could fix the default chunk size algorithm to avoid extreme file size expansion, such as you have demonstrated in this case. For example, the library currently sets the default chunksizes for the measurements variable as this output from netcdf -h -s shows: float measurements(nsets, n_variables, npoints) ; measurements:_Storage = "chunked" ; measurements:_ChunkSizes = 1, 9, 120669 ; resulting in 20 chunks, each of size 1*9*120669*4 = 4344084 bytes, for a total of 86881680 bytes, about 87 Mbytes. Better choices of chunksizes would be (1, 11, 152750) with 5 chunks or (1, 1, 152750) with 55 chunks or or (1, 11, 76375) with 110 chunks for example, none of which would waste any space in the chunks and which would all result in total storage of 33605000, about 34 Mbytes. It looks like the current default chunking can result in a large amount of wasted space in cases like this. Thanks for pointing out this problem. In summary, to work around it currently you either have to avoid using the unlimited dimension for these netCDF-4 files or you have to explicitly set the chunk sizes using the appropriate API call to not waste as much space as for the current choice of default chunk sizes. I'm currently working on making it easy to specify chunksizes in the output of nccopy, but I don't know whether that will make the upcoming 4.1.2 release. If not, it will be available separately in subsequent snapshot releases and should help deal with problems like this, if we don't find a better algorithm for selecting default chunksizes. --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: AIQ-275071 Department: Support netCDF Priority: Normal Status: Closed