This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
> Organization: Geophysical Institute, University of Alaska, Fairbanks > Keywords: 199405262136.AA11923 Hi Mark, > Hi, my name is Mark Conde and I have a (probably nasty) question > regarding several netCDF files of mine that have been corrupted in some > manner. My problem is that something has happened to these files such > that any netCDF utilities that I use to attempt to access them > immediatly return a message saying "not a netCDF file". I believe the > data are mostly intact but have been modified in some places. My hope > is to be able to recover at least some of the data from these files. > Unfortuantely, I do not have any idea of the physical format in which > netCDF stores its data, so I have no means of writing code which can > "scrape" the intact data from these files. Is there such description > circulated? Better yet, are there any utilities to help recover damaged > netCDF files? It would be a major disadvantage if netCDF files become > unuseable after small corruptions - by comparison intact data is easily > recovered from text files. > > Hoping for some help... The test for whether a file is a netCDF file is just comparison of the first four bytes of the file with the netCDF "magic number", which is the bytes 'C', 'D', 'F', SOH (Start Of Header or ASCII control-A, used for the version number of the netCDF format which is still version 1). If you use the Unix "od" command to look at the beginning of a netCDF file as characters with "od -c", you should always see the following as the first four bytes: % od -c foo.nc 0000000 C D F 001 ... If you see some other characters first followed by these four, perhaps some extra bytes were added to the front of our files that you can easily "scrape off" to restore the files. If you see the bytes have been swapped and appear as D C 001 F instead, for example, you can use utilities such as dd to swap each pair of bytes to restore the file. If you see nothing resembling these four bytes near the front of the file, there may be no practical way to recover the data because you won't be able to locate where the netCDF "header" starts that contains information about the names and sizes of dimensions, and names, types, shapes, and offsets within the file of the netCDF variables. I know of no utilities available to help with netCDF data recovery, and yours is the first request we've seen for this capability. There is a chapter in the netCDF User's Guide that describes the structure of a netCDF file. In addition, the netcdf/libsrc/local_nc.h file in the netCDF sources specifies the structure of netCDF arrays, which are encoded by the XDR functions into bytes that appear in the netCDF file. If you have other netCDF files with exactly the same structure as your corrupted netCDF files, or if you have a CDL file that exactly matches the files, it is possible to independently compute the byte offset from the beginning of the file for each variable within each netCDF record. The data could then be read by positioning to the computed byte offset and using the appropriate XDR read for the data type (byte, short, long, float, or double) to decode the data array. If you happen to be using floating-point and have a computer that also supports IEEE float-point, then the floating point arrays in the file don't even need to be "decoded", since the XDR representation for floating-point numbers is then the same as the native representation. As for the existence of a document describing the exact file structure, here's an excerpt from a recent answer to another user about this: We don't have such a document for several reasons. First, there is a chapter in the netCDF User's Guide on "NetCDF File Structure and Performance" that explains the physical structure of netCDF data at a high enough level to make clear the performance implications of different data organizations. Second, we don't want netCDF users to write programs that depend on the physical representation of netCDF data. If they did that, we would not be free in the future to change the physical representation. If users only go through the documented interfaces to access the data, any changes we make in the future physical representation will be transparent to current users. Finally, the file structure is completely specified by the source code, and by the description that it is the XDR-encoding of the NC structures defined in netcdf/libsrc/local_nc.h. Since XDR is specified elsewhere in a separate document, we didn't want to copy that specification but instead just refer to it. That specification is available via a WWW browser such as Mosaic or via gopher at gopher://ds.internic.net/00/rfc/rfc1014.txt .... Anyway, I hope this helps explain why I can't point at a single specific document. __________________________________________________________________________ Russ Rew UCAR Unidata Program address@hidden P.O. Box 3000 (303)497-8645 Boulder, Colorado 80307-3000