This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Ken, > ... Why netCDF-4? Well one (biased) answer was recently provide by one of our users in a support question: > On the other hand, every time I am forced to use HDF or, even worse, > HDF-EOS, I gnash my teeth and rend my garments at the experience of > using a data storage API designed by a combination of database wonks > and [aerospace contractor] engineers. And I mean that in the worst > sense possible. Coming back to the land of milk, honey, and netCDF > after those diasporas is a huge relief. I think the user overstated the differences, and HDF5 is very well designed in some ways, especially for HPC users who have had to deal with lots of APIs and programming interface philosophies, but comments like the above point out a goal we strive to attain: make the interface as simple as practical. Keep the "surface area" of the interface minimal so that the whole data model is comprehensible without too much study. Some of this requires carefully written documentation and a better set of examples than we currently make available, but it also means leaving some complexities out of the data model. As an example, in HDF5 no data object has a unique or distinguished name. If you ask what is the name of a Group or Dataset, the answer can be "here's a list of aliases (links), any one of which refers to the object", but no member of the list is primary. This necessitates developers sometimes having to consider whether two names refer to the same data object, for example. As another example, you need to close every HDF5 object when you are done with it, whereas in netCDF-4, a single close of the file takes care of freeing resources and flushing the buffers to disk. It's an open question whether we have preserved enough simplicity in the netCDF-4 interface and data model to make it as attractive to developers and data providers as netCDF-3. Adding Groups to netCDF-4 by only adding a handful of interfaces is one example where we have succeeded in providing a lot more power with only a small increment in complexity. Our decision not to support the complexities of HDF-5 References introduce may prove to have been wrong, but it's too early to tell whether the power they add is worth the added complexity. As we wrote in our AMS 2006 paper: ... Ultimately, data and applications will only be adapted to netCDF-4 if a critical mass of data and useful applications exist. Some data providers may decide that netCDF-4 is not enough simpler than HDF5 to justify using netCDF-4 rather than HDF5. Similarly, application developers may decide that if they need to modify their applications to fully support netCDF-4, they might consider expending the extra effort required to provide full support for HDF5 as well. The whole idea, which entailed some considerable risk, was to see if we could preserve the desirable common characteristics of netCDF and HDF5 while taking advantage of their separate strengths: the widespread use and simplicity of netCDF and the generality and performance of HDF5. Whether we achieved this objective will ultimately be decided by users, developers, and data providers. If HDF5 becomes more popular than netCDF, the effort made in developing netCDF-4 has still improved both netCDF and HDF. Jim Gray, who was the 1998 Turing Award winner in computer science for his work in relational databases and transaction processing, recently published an article: http://research.microsoft.com/research/pubs/view.aspx?tr_id=860 where he wrote: While the commercial world has standardized on the relational data model and SQL, no single standard or tool has critical mass in the scientific community. There are many parallel and competing efforts to build these tool suites - at least one per discipline. Data interchange outside each group is problematic. In the next decade, as data interchange among scientific disciplines becomes increasingly important, a common HDF-like format and package for all the sciences will likely emerge. We think netCDF-4 and its follow-on developments that we are planning is a candidate for not just the "HDF-like format", but also the data model that may fill this niche. --Russ