This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Reimar, > > How difficult would it be for you to require escaping some of these > > special characters in variable names, for example instead of > > permitting the variable name 'a.b(c)', requiring 'a\.b\(c\)'? > > > > If we required escaping special characters in variable names, we could > > allow all special characters, including blanks. We're considering > > this for netCDF-4, as well as permitting Unicode for names. I realize > > there are backward compatibility problems, I'm just wondering how > > serious the backward compatibility issue is at this point, since it > > will only get worse and eventually make such a change impossible. > > escaping would be fine but we got some questions about how this could or > would be done. > > Does it needs user changes of their programs or is there a nc global > variable planned which makes this automaticly in the belonging routines, > e.g. nf90_def_var ? > > if probably a global variable like nc_use_escape is true then > nf90_def_var knows to write the escape sequences for the non > alphanumeric characters otherwise it gives an error about the wrong signs. > > By reading the routine will know thats a ( should be interpreted as \( > and the user could use the same input name as now. > > In the programs then we could use for example O3(1) but internally it is > stored as O3\(1\). > > If it would be implemented this way there is only a header var to change > and all goes the same as before. You're right, we could provide automatic escaping if a global variable is set appropriately. That may be the best way to do it, but we need to consider how to distinguish between escaped characters that are part of the variable name and the same character used as syntax for something else, such as a "." character used to indicate a member component of a structure variable, which will be permitted with HDF5 as a storage layer. We haven't decided on the best way to do this yet. > Now let me ask some questions about usage of unicode. > It's probably the best method to get used very different language signs, > but what happens if a user does not have the right fonts installed by > looking into a data file? There will be a way to indicate Unicode symbols in an encoding that will distinguish the symbols without requiring Unicode fonts, such as is done for Python. > Did you thought about using of UTF-8 this is described in section 3.9 of > the Unicode 4.0 standard or http://www.ietf.org/rfc/rfc3629.txt? Yes, I think UTF-8 would be a very good way to represent names for netCDF objects. It would ensure that all the current names that use only US-ASCII characters are valid Unicode strings. However, I'm not sure UTF-8 would be the best way to represent character data on disk, since it's a variable length encoding and thus not necessarily suitable for direct access to the nth character in a long string. HDF5 has not dealt with Unicode encoding issues yet, so we will have to determine how to do it for netCDF-4. We may support a default encoding and other encodings specified by a distinguished attribute. --Russ