[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #ACI-624328]: Experiences from rechunking with nccopy
- Subject: [netCDF #ACI-624328]: Experiences from rechunking with nccopy
- Date: Tue, 08 Apr 2014 07:51:56 -0600
Hi Joe,
> Thank you for that detailed answer!
No problem. I'm at a conference all this week, so can't provide much detail
until next week.
However, I'm very interested in chunking issues, and would like to get back to
finding a
good solution for your use case.
> So the chunk cache is only used to cache output chunks? Then it makes sense
> that it doesn't need much.
Right, nccopy reads the input one chunk at (if it's chunked), and writes to the
output,
spreading the values among as many output chunks as needed. So if there's
insufficient
cache for holding all the output chunks that have values from one input chunk,
it will
have to write the same chunk multiple times, for each chunk that gets kicked
out of
the chunk cache to make room for other chunks.
> So in the example file you looked at, rechunked from 1x360x720 to 60x60x60,
> nccopy will read in the first 60 time slices in order to create the first
> output chunk, and then restart and read them again (hopefully from OS disk
> cache next time) to create the next output chunk?
No, I think it works as described above, only reading each input chunk once.
It might work
faster in some cases to read input values multiple times in order to write
output chunks
fewer times, but nccopy isn't smart enough to know how to do that yet.
> My real data files are 100 times larger than the one I sent you (~7GB
> compressed, ~70GB decompressed). I can easily rechunk them to fairly square
> chunks, takes 20 minutes or so, but with very large chunk sizes on the time
> dimension (say 4000x1x1) it takes hours. Currently my strategy has been to do
> it in several steps, which seems to help. Is that a reasonable approach for
> large files or am I doing something wrong if it's that slow?
I'm very interested in knowing about an example that works faster by rechunking
in several
nccopy steps. I've suspected that might sometimes be a better strategy, but
haven't figured
out a good example that demonstrates it.
--Russ
> I've been rechunking compressed files, and it seems to be CPU bound, if
> there's no input chunk cache (except the OS disk cache) that makes sense
> since it would need to decompress the same chunks over and over.
>
> Thanks and have a nice weekend!
>
> - Joe
> ________________________________________
> From: Unidata netCDF Support [address@hidden]
> Sent: 03 April 2014 19:25
> To: Joe Siltberg
> Cc: address@hidden
> Subject: [netCDF #ACI-624328]: Experiences from rechunking with nccopy
>
> Hi Joe,
>
> > I've been doing some rechunking of large files with nccopy, and have
> > encountered two potential issues.
> >
> > The first is that the -h switch seems to work only up to around 1.7
> > GB. Specifying -h 2G or 5G doesn't seem to affect memory usage of
> > the process, or its performance. This was tested with 4.3.1.1 on 64
> > bit CentOS, and also with your prebuilt 64 bit Windows binaries (also
> > 4.3.1.1).
>
> What you're seeing is that the performance isn't limited by the chunk
> cache size, it's limited by the actual I/O. Although you requested 5
> GBytes of chunk cache, the library doesn't malloc that much, but only
> mallocs what it needs up to that amount. So it doesn't need more than
> 1.7 GBytes for the output file chunk cache in your case.
>
> Here's a demonstration you can see from the example file you sent,
> where you want to rechunk the variable Tair:
>
> dimensions:
> lon = 720 ;
> lat = 360 ;
> time = UNLIMITED ; // (365 currently)
> ...
> double Tair(time, lat, lon) ;
>
> from 365 x 360 x 720 to 60 x 60 x 60. I renamed your input file
> "htest.nc" and time each of the following commands after purging the
> OS disk buffers, so the data is actually read from the disk rather
> than from OS cache. If I run (and time)
>
> $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 5g -h 5g htest.nc tmp.nc
> real 0m28.33s
>
> But I get essentially the same time if I only reserve enough chunk
> cache for one output chunk, 60*60*60*8 bytes:
>
> $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 5g -h 1.728m htest.nc
> tmp.nc
> real 0m27.89s
>
> And I don't actually need to reserve any memory at all for the input
> buffer, because all it needs is enough for one chunk of the input, and
> it allocates that much by default:
>
> $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -h 1.728m htest.nc tmp.nc
> real 0m24.86s
>
> But if I don't have enough chunk cache to hold at least one chunk of
> the output file, it takes *much* longer:
>
> $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -h 1.727m htest.nc tmp.nc
> real 16m20.36s
>
> Your results will vary with more variables and more times. For
> example, with 2 variables of the size of Tair, you need to hold two
> chunks of output (about 3.5m), but in that case it turns out to be
> good to specify a larger input file buffer:
>
> $./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 100m -h 3.5m htest2.nc
> tmp.nc
> real 2m49.04s
>
> Incidentally, I get better performance on the small example by using
> the "-w" option to use a diskless write and keep the output in memory
> until the output is closed, not using -h at all:
>
> $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 100m -w htest2.nc tmp.nc
> real 2m13.14s
>
> --Russ
>
> Russ Rew UCAR Unidata Program
> address@hidden http://www.unidata.ucar.edu
>
>
>
> Ticket Details
> ===================
> Ticket ID: ACI-624328
> Department: Support netCDF
> Priority: Normal
> Status: Closed
>
>
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: ACI-624328
Department: Support netCDF
Priority: Normal
Status: Closed