[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #TSI-527912]: nccopy advice - rechunking very large files
- Subject: [netCDF #TSI-527912]: nccopy advice - rechunking very large files
- Date: Tue, 09 Oct 2012 20:03:18 -0600
> On the subject of compression:
> The compression has finished for 3 different rates -d 0/5/9,
> and here are the results:
You may already be aware of this, but just to make sure, the
compression level corresponding to -d0 is *no* compression. So
it might be useful to compare -d1, the lowest and supposedly
fastest level of compression with -d5 and d9. In my experience,
-d1 is a little bit faster than higher levels and they are a
little bit better compression, for a lot of large floating-point
data. So I usually just use -d1 for compression, as the time
it saves is usually worth the small amount of extra data volume.
I used -d0 in the example I ran to explicitly specify that the
output was to be uncompressed. I tought that would be somewhat
faster than compressing it when the output chunks were written
to disk, and it was significantly faster:
Writing uncompressed output took 35:24.38 seconds elapsed:
$ nccopy -ctime/98128,x/8,y/6 -e 102000 -m 40M -h 40G -d0 tmp.nc4
tmp-rechunked.nc4
$ ls -l tmp-rechunked.nc4
-rw-rw-r-- 1 russ ustaff 38970737448 Oct 7 12:36 tmp-rechunked.nc4
whereas compressing the output using level 1 (the default for
nccopy is to compress the output at the same level as the
input) took 52:29.25 seconds elapsed:
$ nccopy -w -ctime/98128,x/8,y/6 -e 102000 -m 40M -h 40G tmp.nc4
tmp-rechunked.nc4
$ ls -l tmp-rechunked.nc4
-rw-rw-r-- 1 russ ustaff 10951640022 Oct 7 18:55 tmp-rechunked.nc4
So in this case it looks like -d1 did pretty well, because the size of the
original compressed file (which used -d1 level compression) was only
$ ls -l tmp.nc4
-rw-rw-r-- 1 russ ustaff 10143354510 Oct 4 16:45 tmp.nc4
So I'm puzzled why the -d5 and -d9 were so much larger than the -d1 result.
If anything, I'd expect them to be a little smaller than the -d1 result.
But maybe your -d5 and -d9 were assuming 1/4 the size of output chunks,
using only 98128/4 along the time dimension?
--Russ
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: TSI-527912
Department: Support netCDF
Priority: Normal
Status: Closed