This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>To: address@hidden >From: "Greg Sjaardema" <address@hidden> >Subject: File Offset questions related to 2GB dataset sizes >Organization: Sandia National Laboratories, Dept 9134 >Keywords: 200309151946.h8FJkSLd028548 Hi Greg, > At Sandia Labs, we have been using netcdf for seveal years as the basis > for our ExodusII finite element database. We have been well served by > this over the years, but we are now close to hitting a brick wall with > regards to the use of the 32-bit signed file offsets. > > As compute platforms have increased in capacity, our users have started > developing larger and larger finite element models. We are now starting > to see models in the range of ~40 million elements with the desire to go > larger in the next few weeks/months. > > We are now hitting the capacity of the netcdf database since a mesh with > ~44 million elements will create datasets such that the offset from the > beginning of the file to the beginning of the "unlimited dimension" > records will exceed 2^31 bytes. As such, we need to make some decisions > quickly (Actually, we should have made some decisions previously...). > As I see it, there are a few options: > > 1. Change file offset to unsigned which would give us a range of 2^32 > bytes. This will buy a little time, and might give some backward > compatability for files storing smaller models... I looked into this, and I think it's not that easy. You still have to make the system calls like lseek() that require an off_t, which is signed because negative offsets have a well-defined meaning. If I recall correctly, trying to use an unsigned offset would require lots of changes in the netCDF library. > 2. Change the offset to a 64-bit variable. This buys a lot more time, > but it results in files which are incompatible with our older files and > also with netcdf files "in the world". We don't rely much on outside > tools for accessing our databases, so we are not too concerned about the > "world-wide" compatability. The internal compatability issues are > important, but no matter how we solve the problem, we will have > compatability issues. This may be your best bet, if you need something soon. You may also be able to promulgate your forked version of the library and file format as "netCDF-64" and offer it to others to use as well (such as the pnetcdf group). If there are enough other users, it may become an ad hoc standard, much like netCDF did. But it's a steep hill to climb, because to create an ad hoc standard you need to convince other developers of applications, data providers, and the people who contributed netCDF interfaces for Java, C++, Fortran90, Perl, Python, Ruby, etc. > 3. Investigate the netcdf-h5 work by t Nancy Yeager. I've got this > working on some or our large test cases, but additional investigation is > needed to see how well this will work. There were some significant limitations in this prototype, for example you could only open one file at a time. We have a more advanced and complete implementation of netCDF-3 on HDF5 in the works; see option 8 below for the caveats. > 4. Totally new format. We are in the process of doing this; namely > developing the SAF database which is based on HDF5, but it will take > some time to get all of the codes migrated over to this new format and > it will take some time for it to be production-ready. No comment. > 5. See if the parallel-netcdf interface recently announced has addressed > the 2GB issue. As far as I can tell, they haven't changed the file > offset variable, so they have the same size limitation as basic netcdf. Currently they do have the same size limitation. They will soon have to decide whether to do something like your option 2 or work with our netCDF-4/HDF5 software (see option 8 again). > My main reason for writing is to see if you have any other options that > we can investigate and also if you have any guidance on the level of > effort needed to implement options 1 or 2 above. Has anyone done any > work in this area that we can use? > > Thanks for any and all replies or input. 7. Make use of the unlimited dimension, if you aren't already, to structure your files so that you can still use only 31-bit offsets within each record, but have a large number of records. You may have already looked at this possibility, but if not, check out the CDL example of how to structure a 2.4 Tbyte netCDF file here: http://www.unidata.ucar.edu/packages/netcdf/f90/Documentation/f90-html-docs/guide9.html#2236524 8. Wait a little while for our NASA-funded netCDF-4/HDF5 work to result in something you can use. The interim netCDF-3/HDF5 prototype we are now building goes well beyond the Nancy Yeager prototype to deliver a backward-compatible netCDF-3 interface on HDF5, so the 2 Gbyte limit should just disappear with a recompile and relink to this library. We are making good progress on this and will be reporting on it at an HDF Workshop next week, but there are some caveats. This netCDF-3 on HDF5 is just an interim prototype on the way to our real goal, an enhanced netCDF-4 on HDF5 that will make better use of the HDF5 format to support an extended netCDF data model. So the mapping to HDF5 that works to support netCDF-3 will almost surely be different from the later mapping we intend to support with netCDF-4. So if you use this, you would help us in determining what's wrong with it, but you might end up with files in an intermediate HDF5 format that use prototype software we can't support. See the Project Abstract: http://www.unidata.ucar.edu/proposals/NASA-AIST-2002/abstract.html and the Project Description: http://www.unidata.ucar.edu/proposals/NASA-AIST-2002/Description.pdf I'm CC:ing a couple of other developers here (Ed Hartnett working on netCDF-3/HDF and John Caron, developer of Java netCDF), in case they have additional input. --Russ