This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi John and Wendy, > We have a couple of questions regarding netCDF, assuming you > are still the one to contact with such questions. It's better to send such questions to "address@hidden". That way you'll get an answer even if I happen to be on vacation or out of town. > We have been using netCDF for 3 years now, previously creating > the binary files via calls to the netCDF C library. We have recently > tried to use ncgen to create the binary files, only to discover that > variable names that were acceptable to the C library (e.g. "Foo.bar" > or "Foo bar") are now flagged as syntax errors by ncgen (it wants > something more like "Foo_bar" ). > > One question is what might be the consequences of modifying > the ncgen's lexical analyzer (ncgen.l) to allow characters such as > periods and spaces in variable names (such as has been done > via the netCDF C libarary)? This issue has come up several times on the netcdfgroup mailing list, so I'll first include relevant excerpts of a posting I wrote to that mailing list that attempted to explain why the netCDF library is more lenient about acceptable netCDF names than the ncgen utility is about CDL names: ... it is possible to create netCDF files with netCDF library calls that ncdump and ncgen cannot handle correctly ... First, here is what the netCDF User's Guide says about CDL names: CDL names for variables, attributes, and dimensions may be any combination of alphabetic or numeric characters as well as `_' and `-' characters, but names beginning with `_' are reserved for use by the library. Case is significant in CDL names. The netCDF library does not enforce any restrictions on netCDF names, so it is possible (though unwise) to define variables with names that are not valid CDL names. Since the netCDF library puts no restrictions on names (except that they must be shorter than MAX_NC_NAME characters) you can even create netCDF files that use names containing punctuation, control characters, and non-ASCII bytes. The CDL data description language, however, requires more restrictive names to make it possible to parse CDL statements. As an example of the potential parsing difficulties, if you named a variable `p(time)', then it would be ambiguous whether the following was a CDL declaration of the scalar variable `p(time)' or a 1-dimensional variable `p' that used the `time' dimension: float p(time) ; Similarly, names that begin with digits are parsed in CDL as numeric constants. A perverse programmer could use new lines and semicolons in netCDF variable names to create a netCDF file that, when dumped with ncdump, would look like CDL statements that had nothing to do with the contents of the file. To get around such possibilities, we could add to the library a check when defining a name that the name conforms to the same regular expression for names used in CDL parsing (in ncgen/ncgen.l) [A-Za-z_][A-Za-z_0-9-]* but someone may want to write a new data description language for netCDF someday that permits a larger subset of names, or there may be users who don't use ncdump or ncgen that are already using more general names, e.g. with `.' in them. Thus adding a new restriction on names at the library level might break existing applications. > Another is, looking towards the future, might the use of spaces > and periods within variable names someday be rejected by the > C library calls? How is your crystal ball? No, as indicated above, we have no intention of changing the library in a way that might break existing applications, so we will continue to permit any characters to be used in netCDF variable, dimension, and attribute names. The only problem with using names that contain punctuation is the inability to use the ncgen utility on the output of ncdump for such files, so if you don't need to use ncgen, there is no reason to change your existing netCDF files. At one point, I tried to change the grammar of CDL to permit the use of the "." character in CDL names because another user asked about this, but at the time I was unable to create a parsable grammar acceptable to yacc that permitted this. I'm not completely convinced this isn't possible, either with yacc or a different parser, but I haven't looked at the problem again recently. I can't remember the details, but I seem to remember that the changes to ncgen.l were straightforward, but I couldn't modify ncgen.y to make things work. > Another possiblility (and probably the cleanest one), would be > to use variable attributes to store our "Foo.bar" strings. This > would, however, require us to rework a substantial amount > of existing code. Yes, but that's not necessary if you don't need to use the ncgen utility. The next release of netcdf (release 2.4) will include two additional utilities developed by Harvey Davies of CSIRO, nc2text and text2nc, that will provide an alternative to ncdump and ncgen for displaying and manipulating netCDF data from the command line. I'm not sure what restrictions these utilities put on netCDF names, but it's possible they are less restrictive than ncgen. I'll try to check on this next week. --Russ ______________________________________________________________________________ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu