[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #ZYO-239963]: Characters allowed in netcdf variable and attribute names
- Subject: [netCDF #ZYO-239963]: Characters allowed in netcdf variable and attribute names
- Date: Tue, 27 Dec 2011 21:06:57 -0700
Hi James!
> We are using the following for netCDF identifiers:
> string allowed =
> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-+_.@" ;
> // string of allowed first characters in netcdf naming
> // convention
> string first =
> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_" ;
>
> Are those character sets close to being correct? Anything missing? How about
> '%' characters?
You may be thinking of names allowed in CF-compliant netCDF files,
which are fairly restricted:
2.3. Naming Conventions
Variable, dimension and attribute names should begin with a letter
and be composed of letters, digits, and underscores. Note that this
is in conformance with the COARDS conventions, but is more
restrictive than the netCDF interface which allows use of the hyphen
character. The netCDF interface also allows leading underscores in
names, but the NUG states that this is reserved for system use.
NetCDF names are considerably more flexible, since we added Unicode
name support for both netCDF-3 and netCDF-4 libraries and file
formats. The details for names are in the format spec in the NetCDF
User's Guide, "Appendix C.1 The NetCDF Classic Format Specification",
http://www.unidata.ucar.edu/netcdf/docs/netcdf.html#NetCDF-Classic-Format
and are also included in the NASA ESDS standard at
http://www.esdswg.com/spg/rfc/esds-rfc-011/ESDS-RFC-011v2.00.pdf
Here's the description in English:
Note on names: Earlier versions of the netCDF C-library reference
implementation enforced a more restricted set of characters in
creating new names, but permitted reading names containing arbitrary
bytes. This specification extends the permitted characters in names
to include multi-byte UTF-8 encoded Unicode and additional printing
characters from the US-ASCII alphabet. The first character of a name
must be alphanumeric, a multi-byte UTF-8 character, or '_' (reserved
for special names with meaning to implementations, such as the
“_FillValue” attribute). Subsequent characters may also include
printing special characters, except for '/' which is not allowed in
names. Names that have trailing space characters are also not
permitted.
Implementations of the netCDF classic and 64-bit offset format must
ensure that names are normalized according to Unicode NFC
normalization rules during encoding as UTF-8 for storing in the file
header. This is necessary to ensure that gratuitous differences in
the representation of Unicode names do not cause anomalies in
comparing files and querying data objects by name.
The regular expression for netCDF names (for dimensions, attributes,
variables, groups, user-defined types, compound type members, and
enumeration labels) is:
([a-zA-Z0-9_]|{MUTF8})([^\x00-\x1F/\x7F-\xFF]|{MUTF8})
where "{MUTF8}" means any multibyte, UTF-8 encoded, NFC-normalized
Unicode character.
The Unicode/UTF-8 stuff was added in versions 3.6.3 and 4.0, in June 2008.
Note that the CDL notation has to escape some characters in names, for
example leading numeric characters, so that a variable named "5DegAvg"
would appear in CDL as "\5DegAvg".
Finally, the question has come up about whether adding Unicode name
support violated our commitment to backwards compatibility. It
doesn't, as the FAQ answer here explains:
http://www.unidata.ucar.edu/netcdf/docs/faq.html#fv22
Too much information? :-)
--Russ
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: ZYO-239963
Department: Support netCDF
Priority: Normal
Status: Closed