[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Character strings in NetCDF...
- Subject: Re: Character strings in NetCDF...
- Date: Thu, 17 Jun 1999 16:26:24 -0600
> Date: Thu, 17 Jun 1999 12:28:23 -0400
> From: Patrice Cousineau <address@hidden>
> Subject: Character strings in NetCDF...
> To: address@hidden
Hi Patrice,
> I'm wondering what is the best way to deal with character string data
> in NetCDF (mainly metadata). It seems very clumsy to have to declare a
> string as an array of characters and need to declare a dimension for
> string length. Furthermore, the processing of these arrays becomes
> very tedious and not very useful to other NetCDF tools.
Yes, unfortunately that is the price we pay for supporting a Fortran
interface. From the section "Reading and Writing Character String
Values" in the Users Guide:
Character strings are not a primitive netCDF external data type,
in part because FORTRAN does not support the abstraction of
variable-length character strings (the FORTRAN LEN function
returns the static length of a character string, not its dynamic
length). As a result, a character string cannot be written or read
as a single object in the netCDF interface. Instead, a character
string must be treated as an array of characters, and array access
must be used to read and write character strings as variable data
in netCDF datasets. Furthermore, variable-length strings are not
supported by the netCDF interface except by convention; for
example, you may treat a zero byte as terminating a character
string, but you must explicitly specify the length of strings to
be read from and written to netCDF variables.
For the relatively small strings that occur in metadata, we often just
declare one string length (for example 80) and use that for all the
character strings, wasting some space for shorter strings, and
explicitly terminating variable-length strings with a null byte.
Another approach is to use netCDF attributes instead of variables for
such string data, since then no explicit lengths need to be declared.
> The only solution I have found is to assign an integer ID to these
> strings (assuming there are a limited number of possibilities) and
> creating a lookup table for them. But then, where and how would i
> store the lookup tables? ...
You can store such a lookup table in a fixed-size netCDF character
variable, dimensioned large enough to hold all your variable-size
character strings:
dimensions:
stringsLen = 1000; // sufficiently large for all metadata strings
numStrings = 100; // maximum number of strings in table
variables:
char strings(stringsLen); // strings table
int stringIndices(numStrings); // where each string starts in table
...
You might also store the string lengths if they are likely to ever
shrink. This is crude and tedious, as you point out, but can be made
a little easier by adding a small interface that reads and writes such
strings and hides the representation.
int putString(char* s); // appends string s to table, returns string number
char *getString(int i); // gets string number i from table
Of course this only works if you won't be growing any of the strings
later, which would require copying them to the end and
garbage-collecting the gap left.
> ... And what to do with an infinite list of
> strings???
If you can't anticipate a maximum for how many strings will be needed
for the metadata or what maximum aggregate storage will be required,
you have a problem for which netCDF may not be appropriate. An
unlimited list of fixed-size strings can be handled with the unlimited
dimension, but an unlimited list of variable-size strings does not fit
well with the netCDF data model, which is designed to support fast
direct access to array-oriented data, where the seek offset of the
data from the beginning of the file can be computed in a fixed amount
of time.
--Russ
_____________________________________________________________________
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu