This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
> Organization: Oklahoma Mesonet > Keywords: 199403092201.AA25413 Hi Sridhar, > Iam working with the Oklahoma Mesonet as a programmer. > Iam currently working on the archival of Mesonet data using NetCDF. > Kindly suggest a compression routine that is best suited for NetCDF files. > The routine should be a platform independant one as the compressed files will > be shared with other users. We are currently using a VMS operating system. > > Is gzip from GNU a good routine. I would like your opinion. gzip is a good general-purpose compression program, if nothing is known about the data. gzip does well for text files or images, but it only seems to do well for binary files containing numeric values if there are a lot of close or identical values. I just tried gzip on a few large netCDF files of floating point values from model outputs, and it compressed the files pretty well, eliminating from 35% to 68% of the bytes needed in the data. However if you try gzip on a file of random floating point data, you may get no compression at all. Another approach to compression is to pack low-precision floating point values into small integers, using 8 or 16-bits for what would otherwise require 32 bits as floating point. I don't know of any better general-purpose compression programs than gzip. In case you want to know more about the packing approach, I've appended a reply I sent earlier to the netcdfroup mailing list on this subject. __________________________________________________________________________ Russ Rew UCAR Unidata Program address@hidden P.O. Box 3000 (303)497-8645 Boulder, Colorado 80307-3000 > I am trying to port our weather model's output to netCDF format. > The user guide mentions that though netCDF is not a good archiving format > its possible to pack data while using netCDF. Could you please elaborate > on that ? A small run typically generates about 25M of data, and so we > are looking into machine independent packing, using byte for a few arrays > and int for most of them. One way to do this is to pack floating-point numbers into ncbyte or ncshort values and use the conventional netCDFattributes `scale_factor' and `add_offset' to store the packing parameters, as described in the User's Guide: `scale_factor' If present for a variable, the data are to be multiplied by this factor after the data are read by the application that accesses the data. `add_offset' If present for a variable, this number is to be added to the data after it is read by the application that accesses the data. If both `scale_factor' and `add_offset' attributes are present, the data are first scaled before the offset is added. The attributes `scale_factor' and `add_offset' can be used together to provide simple data compression to store low-resolution floating-point data as small integers in a netCDF file. When scaled data are written, the application should first subtract the offset and then divide by the scale factor. When `scale_factor' and `add_offset' are used for packing, the associated variable (containing the packed data) is typically of type byte or short, whereas the unpacked values are. intended to be of type float or double. The attributes `scale_factor' and `add_offset' should both be of the type intended for the unpacked data, e.g. float or double. The netCDF library doesn't treat these attributes in any special way, so you have to use their values for packing before you write values and unpacking after you read values. As an example, if you want to pack floating-point values between 950 and 1050 into 8-bit bytes for a program variable named `x' that is to be strored into a netCDF variable named x_packed, the structure of the netCDF file might include a data specification like the following: variables: ... byte x_packed(n); x_packed:scale_factor = 0.3937; x_packed:add_offset = 950; x_packed:_fillValue = 255; ... where we just use the minimum value, 950, for the offset to keep all packed values positive, and we compute the scale factor by using scale_factor = (Max - Min)/(2^Nbits - 2) = (1050 - 950) / (256-2) = 0.39370079 Now before you store the value x, you pack it with the formula: x_packed = (x - add_offset) / scale_factor and you store the byte value x_packed (which will be between 0 and 254) instead. You can use the byte value 255 for a missing value. Similarly, when you read the data back in, you can unpack it using the formula: x = (x_packed - 1)*scale_factor + add_offset If you need more than 8-bits of precision but you still want to each value as one netCDF value, you will have to use 16-bit shorts, and then the formula above will use Nbits = 16 instead of Nbits = 8. If you are using C, you may have to declare x_packed to be an `unsigned char' to get these formulas to work out, or change the formulas to assume signed values. In Fortran there are no unsigned integers, so change the formulas to use signed integers instead. There are other techniques for accessing packed netCDF data (using the units attribute to encode packing information, packing values into a bland array of bytes with some other packing technique and storing the technique name as a variable attribute, etc.) but the one I've outline above is probably the simplest. ---------------------------------------------------------------------------- Russell K. Rew UCAR Unidata Program address@hidden P.O. Box 3000 Boulder, CO 80307-3000 ----------------------------------------------------------------------------