[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #TSI-527912]: nccopy advice - rechunking very large files

This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.

Subject: [netCDF #TSI-527912]: nccopy advice - rechunking very large files
Date: Tue, 09 Oct 2012 20:03:18 -0600

> On the subject of compression:
> The compression has finished for 3 different rates -d 0/5/9,
> and here are the results:

You may already be aware of this, but just to make sure, the
compression level corresponding to -d0 is *no* compression.  So
it might be useful to compare -d1, the lowest and supposedly 
fastest level of compression with -d5 and d9.  In my experience,
-d1 is a little bit faster than higher levels and they are a
little bit better compression, for a lot of large floating-point
data.  So I usually just use -d1 for compression, as the time
it saves is usually worth the small amount of extra data volume.

I used -d0 in the example I ran to explicitly specify that the
output was to be uncompressed.  I tought that would be somewhat
faster than compressing it when the output chunks were written
to disk, and it was significantly faster:

Writing uncompressed output took 35:24.38 seconds elapsed:

  $ nccopy -ctime/98128,x/8,y/6 -e 102000 -m 40M -h 40G -d0 tmp.nc4 
tmp-rechunked.nc4
  $ ls -l tmp-rechunked.nc4
  -rw-rw-r-- 1 russ ustaff 38970737448 Oct  7 12:36 tmp-rechunked.nc4
  
whereas compressing the output using level 1 (the default for 
nccopy is to compress the output at the same level as the 
input) took 52:29.25 seconds elapsed:

  $ nccopy -w -ctime/98128,x/8,y/6 -e 102000 -m 40M -h 40G tmp.nc4 
tmp-rechunked.nc4
  $ ls -l tmp-rechunked.nc4
  -rw-rw-r-- 1 russ ustaff 10951640022 Oct  7 18:55 tmp-rechunked.nc4

So in this case it looks like -d1 did pretty well, because the size of the
original compressed file (which used -d1 level compression) was only

  $ ls -l tmp.nc4
  -rw-rw-r-- 1 russ ustaff 10143354510 Oct  4 16:45 tmp.nc4

So I'm puzzled why the -d5 and -d9 were so much larger than the -d1 result.
If anything, I'd expect them to be a little smaller than the -d1 result.
But maybe your -d5 and -d9 were assuming 1/4 the size of output chunks,
using only 98128/4 along the time dimension?

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: TSI-527912
Department: Support netCDF
Priority: Normal
Status: Closed

Prev by Date: [netCDFJava #ZNN-988290]: reading an hdf 5 file
Next by Date: [netCDF #LTO-228407]: difference between "-lnetcdf" and "-lnetcdff" ?
Previous by thread: [netCDF #TSI-527912]: nccopy advice - rechunking very large files
Next by thread: [netCDF #TSI-527912]: nccopy advice - rechunking very large files
Index(es):
- Date
- Thread