This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Dan, > I could use some advice. > > I am trying to rechunk about 30x or so 8-30 GB netcdf4 files > for the North American Regional Reanalysis physical aggregations > created from a wgrib2 convert process -- for eventual use on our > THREDDS server. > > I am using source compiled binaries from netCDF 4.2.1.1 > > The inputs are chunked as: > > chunkspec (t y x) > 1, 277, 349 > > Into a new file chunked to optimize read access to time series > 98128,6,8 > > These files are 1 parameter for 1 z-level, so z is excluded here. > > using the command: > > $ /san5102/netcdf4/nccopy -m 4000000000 -h 1000000000 > -ctime/98128,x/8,y/6 > /san5102/nexus/narr-physaggs/narr-TMP-850mb_221_yyyymmdd_hh00_000.grb.grb2.nc4 > /raid/nomads/testing/data/narraggs/narr-TMP-850mb_221_yyyymmdd_hh00_000.grb.grb2.nc4.ts > > Issue is, this is unreasonably slow. At the beginning I will get a burst of > about 350-500 KB/sec output (which is reasonable for the server hardware), > then after a few minutes it falls to < 10 KB/sec > ~ for a 10+ GB files, this will take more than 10 days > just to rechunk one file. Adjusting -m and -h options gives only > a minor improvement, the initial write burst lasts longer, but still > eventually floors to <10 KB/sec. > > Do you think this is the best way to optimize for a time > series read access? And what do you suggest to make > the process finish in a reasonable time? Are files of this > size just too much? The output format of the file doesn't > matter to me as long as its netcdf4 and max compression can be > applied later. How much memory do you have that you can dedicate to nccopy when it is rechunking the data? If you have enough memory, use of the -w option may speed things up significantly. Since available memory can make a big difference in how long rechunking takes, is a possible solution just doing the rechunking on a different system with lots of memory, e.g. 64 GB? Memory is pretty cheap compared to programmer time these days, so I'm wondering if that's a possibility ... Another approach that might work is using more than one pass over the data by writing an intermediate file that's rechunked in a way intermediate between the current input and the desired output. This problem is very interesting to me, and I'd like to be able to test approaches to optimizing access for time series using a real data file rather than some artificial test data. Could you either make available one of those input files (but not as an email attachment! :-) or tell me how to get one? Especially when dealing with questions that may ultimately involve compression as well as chunking, it's important to deal with real-world data. If that's not practical, I'd like to get the CDL from ncdump -h (or -c) for the input netCDF file as well as CDL for the desired output, so I know exactly what yuo're trying to do. --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: TSI-527912 Department: Support netCDF Priority: Normal Status: Closed