[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #ACI-624328]: Experiences from rechunking with nccopy



Hi Joe,

> Thank you for that detailed answer!

No problem.  I'm at a conference all this week, so can't provide much detail 
until next week.
However, I'm very interested in chunking issues, and would like to get back to 
finding a
good solution for your use case.

> So the chunk cache is only used to cache output chunks? Then it makes sense 
> that it doesn't need much.

Right, nccopy reads the input one chunk at (if it's chunked), and writes to the 
output,
spreading the values among as many output chunks as needed.  So if there's 
insufficient
cache for holding all the output chunks that have values from one input chunk, 
it will
have to write the same chunk multiple times, for each chunk that gets kicked 
out of
the chunk cache to make room for other chunks.

> So in the example file you looked at, rechunked from 1x360x720 to 60x60x60, 
> nccopy will read in the first 60 time slices in order to create the first 
> output chunk, and then restart and read them again (hopefully from OS disk 
> cache next time) to create the next output chunk?

No, I think it works as described above, only reading each input chunk once.  
It might work 
faster in some cases to read input values multiple times in order to write 
output chunks
fewer times, but nccopy isn't smart enough to know how to do that yet.

> My real data files are 100 times larger than the one I sent you (~7GB 
> compressed, ~70GB decompressed). I can easily rechunk them to fairly square 
> chunks, takes 20 minutes or so, but with very large chunk sizes on the time 
> dimension (say 4000x1x1) it takes hours. Currently my strategy has been to do 
> it in several steps, which seems to help. Is that a reasonable approach for 
> large files or am I doing something wrong if it's that slow?

I'm very interested in knowing about an example that works faster by rechunking 
in several
nccopy steps.  I've suspected that might sometimes be a better strategy, but 
haven't figured 
out a good example that demonstrates it.

--Russ

> I've been rechunking compressed files, and it seems to be CPU bound, if 
> there's no input chunk cache (except the OS disk cache) that makes sense 
> since it would need to decompress the same chunks over and over.
> 
> Thanks and have a nice weekend!
> 
> - Joe
> ________________________________________
> From: Unidata netCDF Support [address@hidden]
> Sent: 03 April 2014 19:25
> To: Joe Siltberg
> Cc: address@hidden
> Subject: [netCDF #ACI-624328]: Experiences from rechunking with nccopy
> 
> Hi Joe,
> 
> > I've been doing some rechunking of large files with nccopy, and have
> > encountered two potential issues.
> >
> > The first is that the -h switch seems to work only up to around 1.7
> > GB. Specifying -h 2G or 5G doesn't seem to affect memory usage of
> > the process, or its performance. This was tested with 4.3.1.1 on 64
> > bit CentOS, and also with your prebuilt 64 bit Windows binaries (also
> > 4.3.1.1).
> 
> What you're seeing is that the performance isn't limited by the chunk
> cache size, it's limited by the actual I/O.  Although you requested 5
> GBytes of chunk cache, the library doesn't malloc that much, but only
> mallocs what it needs up to that amount.  So it doesn't need more than
> 1.7 GBytes for the output file chunk cache in your case.
> 
> Here's a demonstration you can see from the example file you sent,
> where you want to rechunk the variable Tair:
> 
> dimensions:
> lon = 720 ;
> lat = 360 ;
> time = UNLIMITED ; // (365 currently)
> ...
> double Tair(time, lat, lon) ;
> 
> from 365 x 360 x 720 to 60 x 60 x 60.  I renamed your input file
> "htest.nc" and time each of the following commands after purging the
> OS disk buffers, so the data is actually read from the disk rather
> than from OS cache.  If I run (and time)
> 
> $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 5g -h 5g htest.nc tmp.nc
> real  0m28.33s
> 
> But I get essentially the same time if I only reserve enough chunk
> cache for one output chunk, 60*60*60*8 bytes:
> 
> $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 5g -h 1.728m htest.nc 
> tmp.nc
> real  0m27.89s
> 
> And I don't actually need to reserve any memory at all for the input
> buffer, because all it needs is enough for one chunk of the input, and
> it allocates that much by default:
> 
> $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -h 1.728m htest.nc tmp.nc
> real  0m24.86s
> 
> But if I don't have enough chunk cache to hold at least one chunk of
> the output file, it takes *much* longer:
> 
> $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -h 1.727m htest.nc tmp.nc
> real  16m20.36s
> 
> Your results will vary with more variables and more times.  For
> example, with 2 variables of the size of Tair, you need to hold two
> chunks of output (about 3.5m), but in that case it turns out to be
> good to specify a larger input file buffer:
> 
> $./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 100m -h 3.5m htest2.nc 
> tmp.nc
> real  2m49.04s
> 
> Incidentally, I get better performance on the small example by using
> the "-w" option to use a diskless write and keep the output in memory
> until the output is closed, not using -h at all:
> 
> $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 100m -w htest2.nc tmp.nc
> real  2m13.14s
> 
> --Russ
> 
> Russ Rew                                         UCAR Unidata Program
> address@hidden                      http://www.unidata.ucar.edu
> 
> 
> 
> Ticket Details
> ===================
> Ticket ID: ACI-624328
> Department: Support netCDF
> Priority: Normal
> Status: Closed
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: ACI-624328
Department: Support netCDF
Priority: Normal
Status: Closed