This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Tomas,
I'm sorry to have taken so long to answer your question about the size of
netCDF files produced by the C and C++ interface. Investigating the problem
revealed a bug that will be fixed in the next release.
The explanation is that the
NcVar::add_att( NcToken attname, const char* )
function invoked in netcdf/c++/example.cc, used to define string attributes,
as in
P->add_att("units", "hectopascals");
stores the string attributes with the trailing "\0" character counted as
part of the attribute value. Here's the relevant code from the
NcVar::add_att member function in netcdf/c++/netcdf.cc:
if (ncattput(the_file->id(), the_id, aname, (nc_type) ncChar,
strlen(val) + 1, val) == ncBad)
The C version in example.c provides explicit lengths, and doesn't include
the trailing "\0" character, for example:
ncattput (ncid, P_id, "units", NC_CHAR, 12,
(void *)"hectopascals");
Hence the C++ version is storing an extra character, the trailing "\0", for
every string attribute.
When ncdump reads and prints a string attribute, it doesn't include any
trailing null byte, since that is assumed to be the end-of-string marker
from C. Hence ncdump will print exactly the same attribute value for a
four-character attribute value "abc\0" as it will for a three-character
attribute value "abc".
I think the behavior of ncdump is OK in this respect, although it means
running ncdump and then ncgen on a file containing attributes with trailing
nulls will strip the trailing nulls, so the resulting file will be smaller
than the original. The NetCDF User's Guide recommends:
In C, fixed-size strings may be written to a netCDF file without the
terminating null byte, to save space. Variable-length strings should be
written @emph{with} a terminating null byte so that the intended length
of the string can be determined when it is later read.
...
In FORTRAN, fixed-size strings may be written to a netCDF file without a
terminating character, to save space. Variable-length strings
should follow the C convention of writing strings with a terminating
null byte so that the intended length of the string can be determined
when it is later read by either C or FORTRAN programs.
so it does not require the terminating null byte.
I can fix the inconsistency you have uncovered in either of two ways:
1. Change the c++/example.c code so that it includes the trailing null
byte in the attribute length for all string attributes.
2. Change the code for NcVar::add_att( NcToken attname, const char* ) so
that it doesn't store the trailing null byte.
I prefer the second fix, but in trying it, I just noticed it requires a
rewrite of the NcValues_char::print(ostream&) member function in
ncvalues.cc. I've added that to my list of things to do before the
alpha-test version of netCDF 2.4 is ready.
Anyway, thanks for being persistent in asking about this problem, even
though I was apparently ignoring it the first time you asked. You have
uncovered a bug that we will fix.
--Russ
______________________________________________________________________________
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu