This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Mark, > I've been looking closer at the cause of the second problem, and have a > hypothesis. When you look at how nccopy iterates through a variable when > making the copy (ie. in up_start_by_chunks() in nciter.c), it goes in reverse > order of the dimensions. e.g. for CHL1_mean[date,lon,lat] it scans first > through lat first, then lon, then date. However, this can be very memory > inefficient in the situation where you are trying to make the rearrangement > along the date dimension - you essentially have to load the entire file to > get enough data to write an entire date chunk.... > > I could see two solutions. > > 1. automagically work out which dimension to scan in (hard to implement > robustly) > 2. infer the scan direction from the -c argument i.e. if you only specify > date/5186 (and nothing else), and you have a variable with > date/1,lat/30,lon/30, then the most efficient way to rechunk it would be to > read along the date dimension first, then the lon and lats..... > > Hmmm. I'm not sure that makes any sense - it's kind of hard to explain. Can > you follow my logic? Yes, but I see some complications that make my head hurt. If you want to rechunk a variable, it's not clear whether it's better to access the input one input chunk at a time to write the output in an inefficient order, or to access the input in an inefficient order so that you can write the output one output chunk at a time. Currently the nc_next_iter() function in nciter.c does the former, but it sounds like you think it would be better if it did the latter. I think you can construct examples where either strategy is efficient or horribly inefficient, depending on the shapes of chunks in the input and output files. I think the right thing to do would be to determine, from the chunk shapes of input and output, which strategy to implement, or even whether to use a hybrid strategy involving multiple passes and an intermediate file or in-memory structure. I tried to determine whether this research has already been done, but couldn't find a paper that provided a clear solution. Maybe it's easier than I'm making it out to be, and there's a clear and simple solution. If so, I'd like to implement it! --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: AWT-862217 Department: Support netCDF Priority: Normal Status: Closed