This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
John, > Or perhaps netcdf should stay lean and clean, and these complexities be > implemented in a larger system like HDF, which seems to have a lot of > funding? I dont know what your vision of netcdf is, and its relation to > other systems. With hdf having a netcdf interface, one could argue that > large datasets should move to hdf, and netcdf remain a small, > understandable system. I definitely don't want to add too many "bells and whistles" to netCDF in an attempt to satisfy specialized needs. I want to keep the surface area of the interface small and the User's Guide short enough that it is not intimidating. I also want to make sure any new features don't impose a cost on those who don't use them or need them. I think packing is currently the single feature that we're getting the most pressure to add, but almost all of that pressure is coming from NCAR users rather the majority of uses in various earth science and other disciplines. Nevertheless, I think the addition of (mostly) transparent packing is important and would make netCDF more useful. As far as vision goes, one thing I would like to be working on is improvements to the C++ interface, perhaps even netCDF iterators that fit in with the new Standard Template Library. This would make available to netCDF programmers some powerful and easy-to-use tools for looping through, selecting, searching, sorting, ... of netcdf objects. > > I'm still hoping we can work out the details of a packed floating-point > > representation such as you have suggested, because I think it's superior to > > my idea of using arrays of scales and offsets. Please let me know if you > > have any other thoughts on this. > > Perhaps you could give me a thumbnail sketch of your "array of scales and > offsets" design, so I can think about it concretely. I remain undecided as > to the advantages of scale and offset vs small floating point. OK, although I haven't worked out the precise additions to the C and Fortran interface that would be required. Some of these details would apply to the use of packed floating-point as well as scale and offset arrays. My idea was that the packing parameters for a variable would be set up at variable definition time. Readers would not have to be aware of whether a variable had been set up as packed or not, but could find out the packing parameters by calling a suitable inquire function for the variable. A writer would get an error returned if it tried to write a value inconsistent with the packing parameters for a variable, but otherwise wouldn't have to know that a variable was packed. All netCDF types would permit packed representations, so three-bit ints and booleans could be stored efficiently even if they are declared to be of type NC_BYTE. The number of bits (Nbits) would be a scalar packing parameter for a variable, so you couldn't use 10 bits for one cross-section and 6 bits for a different cross-section of the same array. The other two packing parameters, Scale and Offset, could be multi-dimensional arrays using some subset (including the empty subset for scalar packing constants) of the dimensions of a variable. For example, float T(time, level, lat, lon) could have Scale(level) and Offset(level), to exploit the fact that temperatures at a given atmospheric level may have a smaller range (and hence be packed better) than global temperatures at all levels. You might also use the lat dimension as a packing dimension, Scale(level,lat) and Offset(level,lat) would be 2-d packing arrays set up for the variable T. The Nbits, Scale, and Offset parameters must be defined for a variable before any values have been written for that variable (including _FillValues) and must not be redefined with different values after any values (including _FillValues) have been written. 0 <= Nbits <= 32. If Nbits is 0, no data needs to be stored, and this variable is only a handle for attributes. In this case, the variables value on a read is the _FillValue. It is not possible to store more than 32 bits of precision, even for double values, [because of the restrictions of XDR?]. Providing a value of Nbits greater than 16 for a NC_SHORT variable or greater than 8 for an NC_CHAR or NC_BYTE variable is not useful. One value of the packed range will be used for the representation of the packed _FillValue, so the packed values will represent 2^Nbits - 1 distinct data values. The Offset parameter should be of the same type as the variable. A useful value of a scalar Offset is the minimum valid data value, so that all packed data will be non-negative. The Scale parameter should be of type double [or float?]. A useful value of Scale in the case that data values map to the integers 0, 1, ..., 2^Nbits-2 and the missing value maps to 2^Nbits-1 is: (Max - Min)/(2^Nbits - 2) assuming the packing formulas are: packed = truncate_toNbits( (value - Offset) / Scale ) value = packed * Scale + Offset The _FillValue will be mapped into the packed range to 2^Nbits-1. The _FillValue (and valid_range, valid_min, or valid_max) parameters should always be specified in terms of the unpacked values of a variable. --Russ