This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Greg, Chunking is a property associated with variables rather than files. Any compressed variable is chunked by default, with each chunk compressed and uncompressed independently. The chunking for a variable is determined when it is created (as is the compression level). Chunking and compression are properties of a variable that cannot be changed after the variable is defined (which for the C++ interface means after any data has been written to the file). If chunking parameters are not specified when a variable is defined, default chunking is used, which may not be optimal. Expected access patterns for a variable can help determine good chunking parameters. All this is documented in the netCDF-4 C Users Guide, but not in the C++ Users Guide, which is still just for netCDF-3. A better introduction to chunking might be the 10 "slides" (really short web pages) on chunking and compression from the 2008 netCDF training workshop at http://www.unidata.ucar.edu/netcdf/workshops/2008/nc4chunking/ > Regarding chunking, I open the file with code that resembles: > > size_t ncChunkSize_bytes = yBins * xBins; > size_t* chunkSizePtr = &ncChunkSize_bytes; > > NcFile* ncFile = new NcFile( tempName.c_str(), NcFile::Replace, > chunkSizePtr, 0, ncFileFormat ); The unfortunately named chunkSizePtr in this ncFile constructor has nothing to do with the per-variable chunk sizes (one component for each dimension of the variable). > Can you identify anything particularly offensive about this method > of opening or writing a NetCDF file? Since you don't have any unlimited dimensions, the default chunking is the full dimension size for each fixed dimension, which in your case means a single chunk. So reading a single value out of this variable would require uncompressing all of the data. Furthermore, the default chunk cache size is smaller than this chunk, so does no good at all. (In release 4.0.1 we're chaingin the default chunk cache size to always hold at least one chunk.) > For instance, if I set my chunk size to the dimension of one grid, > should I also call 'put' once for each forecast grid? Perhaps that > way, a reader would not be forced to read the entire block at once. Right, the idea is to have each chunk big enough that read accesses by chunks are efficient, but uncompressing a single chunk is not a bottleneck. You can also set chunks s that data can be read along any dimension axis without favoring one order of reading over another. > If there are any published guides that describe using the C++ API > with chunking, please pass them along too. There's a good explanation of how important chunking can be to performance, with an instructive example, in this paper: http://www.hdfgroup.org/pubs/papers/2008-06_netcdf4_perf_report.pdf --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: VCQ-846449 Department: Support netCDF Priority: Normal Status: Closed