This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Dan, > On the subject of compression: > The compression has finished for 3 different rates -d 0/5/9, > and here are the results: > > 15 269 652 694 Aug 12 00:34 narr-TMP-850mb_221_yyyymmdd_hh00_000.grb > 8 522 977 797 Sep 30 15:31 narr-TMP-850mb_221_yyyymmdd_hh00_000.grb.grb2 > 10 143 354 510 Sep 30 15:50 narr-TMP-850mb_221_yyyymmdd_hh00_000.grb.grb2.nc4 > (x/y -d4) > 38 970 737 399 Oct 9 11:07 > narr-TMP-850mb_221_yyyymmdd_hh00_000.grb.grb2.nc4.ts.d0 > 18 693 041 241 Oct 9 14:51 > narr-TMP-850mb_221_yyyymmdd_hh00_000.grb.grb2.nc4.ts.d5 > 18 548 053 490 Oct 9 15:06 > narr-TMP-850mb_221_yyyymmdd_hh00_000.grb.grb2.nc4.ts.d9 > > Curious that chunking to time series creates more variability in the data, > probably messing with the zlib algorithm and resulting in a larger file size > overall for this chunked data. Right, the horizontal values at a particular time are probably more uniform in temperature at a specific time (e.g. seasonal), so compress better than the temperatures at all times over all seasons, which would have a greater range from the coldest minimum to the hottest maximum. Also the chunking you specified results in some extra missing values inserted in chunks on the edges, due to "overhang". This is because the 6x8 horizontal chunk dimensions don't fit evenly in the 277x349 horizontal slabs, but all the output chunks must be the same size, so those edge chunks with missing values "pollute" the compression. This overhang results in more than 1GB of extra data that needs to be compressed: 2068 * 18840576 - 98128*386692 = 1016998592 bytes > At the 2011 Unidata workshop on Netcdf.. the old GRIB vs. NetCDF discussion > occured and I recall hearing (but not from who) that the > same JPEG2000 (jasper) compression used in GRIB2 was to be implemented > with netcdf ~ and also some wavelet compression that was supposedly superior > to JPEG in some cases... any news on this? Yes, Sengcom, the RAL project to implement such a compression algorithm, could not use the patented JPEG2000 algorithm due to licensing issues. ... JPEG2000 used patented technology on the 2 dimensional wavelets (EZW or SPIHT), which RAL could not use without implementing the entire JPEG2000 spec. Also to reduce the implementation complexity, Sengcom tried only 1-dimensional wavelets and provided strict control over the maximum absolute error, rather than mean error or other less stringent error measures that might allow better compression. Tests on real data with a supposedly superior wavelet algorithm and 25 other wavelet types, all 1-dimensional, showed that JPEG2000 was superior in all but a small percentage of cases, and often provided 2 or 3 times better compression. If you're interested, the final report on that project should be available later this week. --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: TSI-527912 Department: Support netCDF Priority: Normal Status: Closed