[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #TSI-527912]: nccopy advice - rechunking very large files
- Subject: [netCDF #TSI-527912]: nccopy advice - rechunking very large files
- Date: Tue, 09 Oct 2012 09:04:13 -0600
Hi Dan,
This is just a short followup on using nccopy to rechunk files.
I'm assuming the goal is to allow fast access to all the data for a point
or small region for all 98128 times (each originally stored in a separate
chunk) without having to access 98128 distinct disk blocks. This goal can
certainly be achieved by rechunking with data for all times in each chunk,
but that can require a lot of memory, because all the output chunks must be
kept in memory throughout the rechunking.
If you can accept making only a few disk accesses instead of only one to get
data for all the times for a point or small region, then the rechunking can
be done faster and using a lot less memory. For example, if you measure and
conclude that using only 4 disk accesses instead of 98128 suffices for the
use case you have in mind, then rechunking to chunks with length 98128/4 =
24532 along the time access means you only have to have enough memory for
1/4 of the output file, and the rechunking can still be done in about 30 minutes
on a disktop machine. For example, here's what it took on my Linux desktop,
reserving only 10 GB of memory for the chunk cache:
$ /usr/bin/time nccopy -ctime/24532,x/16,y/12 -e 102000 -m 40M -h 10G -d0
tmp.nc4 tmp-rechunked.nc4
1264.99user 175.39system 31:34.06elapsed 76%CPU (0avgtext+0avgdata
12299388maxresident)k
18554864inputs+77738408outputs (22856major+12001463minor)pagefaults 0swaps
Interactive access with 4 disk reads per query would probably seem just as fast
as with
one disk access per query. Similarly, accepting a number larger than 4 might
be a good
compromise between access time and processing time to rechink the data ...
--Russ
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: TSI-527912
Department: Support netCDF
Priority: Normal
Status: Closed