This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Jeff, > Thanks for the additional info. I will be using release 4.3.21 (or later) > regardless of what file format we ultimately end up using. You mentioned > that 4.3.2 should improve the default chunking, but the results I sent were > using newer than that release already, so it sounds like I shouldn't expect > any improvements on the NC4 files size at this point, correct? The improvement that netCDF C version 4.3.2 made was to change the default chunk size for 1-dimensional record variables to DEFAULT_CHUNK_SIZE bytes, where DEFAULT_CHUNK_SIZE is a configure-time constant with default value 4194304. I'm surprised that using different chunk sizes made no difference in the file size, so I may try to duplicate your results to understand how that happened. --Russ > address@hidden> wrote: > > > Hi Jeff, > > > > > From those articles the purpose of chunking is to improve performance for > > > large multi-dimensional data sets. It seems like it won't really provide > > > any benefit in out situation since we only have one dimension. I know > > that > > > NetCDF4 added chunking, but are all NetCDF4 files chunked, i.e., is there > > > such a thing as a non-chunked NetCDF4 files? Or is that a contradiction > > in > > > terms somehow? > > > > No, all netCDF-4 files aren't chunked. The simpler alternative, > > contiguous layout, > > is better if you don't need compression, unlimited dimensions, or support > > for > > multiple patterns of access that chunking makes possible in netCDF-4 files. > > > > A netCDF-4 variable can use contiguous layout if doesn't use an unlimited > > dimension or any sort of filter such as compression or checksums. > > > > > Given that NetCDF4 readers are backwards-compatible with NetCDF3 files, > > is > > > there any reason not to use a NetCDF3 file from your perspective? My > > > suspicion is that our requirement is just being driven by "use the latest > > > version" rather than any technical reasons. > > > > I think I agree with you. With only one unlimited dimension, and if you > > don't need > > the transparent compression that netCDF-4 makes possible, there's no need > > to > > not just use the default contiguous layout that a netCDF-3 format file > > provides. > > However, you should still use the netCDF-4 library, just don't specify the > > netCDF-4 > > format when you create the file. That's because the netCDF-4 software > > includes bug > > fixes, performance enhancements, portability improvements, and remote > > access > > capabilities mot available in the old netCDF-3.6.3 version software. > > > > The reason you were seeing a 7-fold increase in size is exactly as Ethan > > pointed out, > > due to way the HDF5 storage layer implements unlimited dimensions, using > > chunking > > implemented with B-tree data structures and indices, rather than a simpler > > contiguous > > storage used in the classic netCDF format. The recent netcdf-4.3.2 > > version improves > > the default chunking for 1-dimensional variables with an unlimited > > dimension, as in > > your case, so may be sufficient to provide both smaller files and benefits > > of netCDF-4 > > chunking, but without testing I can't predict how close it comes to the > > simpler netCDF > > classic format in this case. Maybe I can get time later today to try it > > ... > > > > > I couldn't find anything on the NetCDF website regarding "choosing the > > > right format for you". I was hoping there'd be something along those > > lines > > > in the FAQ, but no luck. > > > > The FAQ section on "Formats, Data Models, and Software Releases" > > > > http://www.unidata.ucar.edu/netcdf/docs/faq.html > > > > is intended to clarify the somewhat complex situation with multiple > > versions of netCDF > > data models, software, and formats, but evidently doesn't help much in > > your case of > > choosing whether to use the default classic netCDF format, the netCDF-4 > > classic model > > format, or the netCDF-4 format. > > > > Thanks for pointing out the need for improving this section, and in > > particular the answer > > to the FAQ "Should I get netCDF-3 or netCDF-4?", which should really > > address the question > > "When should I use the netCDF classic format?". > > > > --Russ > > > > > address@hidden> wrote: > > > > > > > Hi Jeff, > > > > > > > > How chunking and compression affect file size and read/write > > performance > > > > is a complex issue. I'm going to pass this along to our chunking expert > > > > (Russ Rew) who, I believe, is back in the office on Monday and should > > be > > > > able to provide you with some better advise than I can give. > > > > > > > > In the mean time, here's an email he wrote in response to a > > conversation > > > > on the effect of chunking on performance that might be useful: > > > > > > > > > > > > > > http://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2013/msg00498.html > > > > > > > > Sorry I don't have a better answer for you. > > > > > > > > Ethan > > > > > > > > Jeff Johnson wrote: > > > > > Ethan- > > > > > > > > > > I made the changes you suggested with the following result: > > > > > > > > > > 10000 records, 8 bytes / record = 80000 bytes raw data > > > > > > > > > > original program (NetCDF4, no chunking): 537880 bytes (6.7x) > > > > > file size with chunk size of 2000 = 457852 bytes (5.7x) > > > > > > > > > > So a little better, but still not good. I then tried different chunk > > > > sizes > > > > > of 10000, 5000, 200, and even 1, which I would've thought would give > > me > > > > the > > > > > original size, but all gave the same resulting file size of 457852. > > > > > > > > > > Finally, I tried writing more records to see if it's just a symptom > > of a > > > > > small data set. With 1M records: > > > > > > > > > > 8MB raw data, chunk size = 2000 > > > > > 45.4MB file (5.7x) > > > > > > > > > > This is starting to seem like a lost cause given our small data > > records. > > > > > I'm wondering if you have information I could use to go back to the > > > > archive > > > > > group and try to convince them to use NetCDF3 instead. > > > > > > > > > > jeff > > > > > > > > > > > > Ticket Details > > > > =================== > > > > Ticket ID: BNA-191717 > > > > Department: Support netCDF > > > > Priority: Normal > > > > Status: Open > > > > > > > > > > > > > > > > > -- > > > Jeff Johnson > > > DSCOVR Ground System Development > > > Space Weather Prediction Center > > > address@hidden > > > 303-497-6260 > > > > > > > > Russ Rew UCAR Unidata Program > > address@hidden http://www.unidata.ucar.edu > > > > > > > > Ticket Details > > =================== > > Ticket ID: BNA-191717 > > Department: Support netCDF > > Priority: Normal > > Status: Closed > > > > > > > -- > Jeff Johnson > DSCOVR Ground System Development > Space Weather Prediction Center > address@hidden > 303-497-6260 > > Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: BNA-191717 Department: Support netCDF Priority: Normal Status: Closed