[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #AWT-862217]: nccopy chunking argument
- Subject: [netCDF #AWT-862217]: nccopy chunking argument
- Date: Thu, 19 Jan 2012 11:29:22 -0700
Mark,
> I've been looking closer at the cause of the second problem, and have a
> hypothesis. When you look at how nccopy iterates through a variable when
> making the copy (ie. in up_start_by_chunks() in nciter.c), it goes in reverse
> order of the dimensions. e.g. for CHL1_mean[date,lon,lat] it scans first
> through lat first, then lon, then date. However, this can be very memory
> inefficient in the situation where you are trying to make the rearrangement
> along the date dimension - you essentially have to load the entire file to
> get enough data to write an entire date chunk....
>
> I could see two solutions.
>
> 1. automagically work out which dimension to scan in (hard to implement
> robustly)
> 2. infer the scan direction from the -c argument i.e. if you only specify
> date/5186 (and nothing else), and you have a variable with
> date/1,lat/30,lon/30, then the most efficient way to rechunk it would be to
> read along the date dimension first, then the lon and lats.....
>
> Hmmm. I'm not sure that makes any sense - it's kind of hard to explain. Can
> you follow my logic?
Yes, but I see some complications that make my head hurt.
If you want to rechunk a variable, it's not clear whether it's better
to access the input one input chunk at a time to write the output in an
inefficient order, or to access the input in an inefficient order so that
you can write the output one output chunk at a time.
Currently the nc_next_iter() function in nciter.c does the former, but it
sounds like you think it would be better if it did the latter. I think you
can construct examples where either strategy is efficient or horribly
inefficient, depending on the shapes of chunks in the input and output
files.
I think the right thing to do would be to determine, from the chunk shapes
of input and output, which strategy to implement, or even whether to use
a hybrid strategy involving multiple passes and an intermediate file or
in-memory structure. I tried to determine whether this research has
already been done, but couldn't find a paper that provided a clear solution.
Maybe it's easier than I'm making it out to be, and there's a clear and simple
solution. If so, I'd like to implement it!
--Russ
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: AWT-862217
Department: Support netCDF
Priority: Normal
Status: Closed