[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: nclong
- Subject: Re: nclong
- Date: Tue, 04 Jan 1994 16:43:59 -0700
> Organization: NCSA
> Keywords: 199401042213.AA10567
Hi Chris,
> Hi, how's it going? I've been looking at improving the
> HDF 3.3 port on the Dec Alpha. Specifically, the difference
> between the number of bits in an NC_LONG variable in memory
> vs. on disk is a problem for the netCDF/HDF interaction.
>
> In looking through the code for netcdf.h there is the
> text:
>
> #ifdef __alpha
> typedef long nclong; /* when the library is modified to
> * use `nclong' declarations, this
> * will become an `int' */
> # define NCLONG_DEFINED
> #endif
> #ifndef NCLONG_DEFINED
> typedef long nclong; /* default, compatible type */
> #endif
>
> So is it the intention that in the future data written to
> variables / attributes of type NC_LONG should use the
> C type 'nclong' rather than 'long'? If so, do you have
> an idea about when you would be modifying the library to
> take this into account?
Yes, that is our intention, and the changes should be in the next release.
I can't say how soon that version will be released, but I suspect it won't
be until mid-1994. There may be a minor release before then fixing bugs.
There's still some work involved in changing all the documentation,
examples, and test programs to use nclong instead of long for declarations
of data variables. We had given a rationale for this in a previous
announcement on the netcdfgroup mailing list:
You should know, however, that the port of the FORTRAN interface to
DEC's 64-bit Alpha machine uncovered an ambiguity in. the netCDF
specification. In order to allow the writing of portable netCDF code,
we have decided, after considerable discussion, to make a slight change
to the netCDF specification. This change will not affect the operation
or portablility of existing or future netCDF code that is not intended
to run on machines such as DEC's Alpha. netCDF code for which such a
machine is a possible platform, however, should be modified or written
to adhere to the new specification.
The change is the introduction of a new datatype, which is defined in
the netCDF header file `netcdf.h'. This new datatype is `nclong'.
Maximally portable C code should use this datatype to hold all values of
type NC_LONG. For example:
#include "netcdf.h"
...
int ncid, lid, status, dimids[NDIM];
long start[NDIM], count[NDIM];
nclong data[SIZE]; /* NB: new datatype */
...
lid = ncvardef(ncid, "somelongvar", NC_LONG, dimids);
...
status = ncvarput(ncid, lid, start, count, data);
...
status = ncvarget(ncid, lid, start, count, data);
Note that only variables for NC_LONG values should have type `nclong'.
Other, traditionally `long' variables (such as the `count' and `start'
vectors for hyperslab access) should remain as C `long's.
FORTRAN programmers needn't worry about this change because portable
FORTRAN code only has INTEGER values to play with (and not, for example,
INTEGER*4, which is a non-portable datatype). Thus, no change to the
FORTRAN netCDF interface specification is required.
This is the extent of the change. The rest of this message gives the
rationale for the change.
The netCDF implementation and interface assume that a C `long' maps
naturally into the 32 bit external integer representation of an NC_LONG.
This is rooted in historical networking code traceable to the BSD
functions ntohl() and htonl().
With the introduction of DEC's Alpha machine, we are seeing reasons to
question this. The Alpha has 64 bit `long's and 32 bit `int's. The
natural and efficient choice for a C datatype which maps to an NC_LONG
would, therefore, be `int' rather than `long'. (We have encountered 64
bit `long's before on the Cray, but there the `int' is also 64 bits, so
there is no advantage to using different types).
Furthermore, on the Alpha, the FORTRAN INTEGER type is 32 bits;
therefore, keeping the C type as `long' for NC_LONG values would add
costly transformations to the FORTRAN interface on this platform.
On all platforms known to us (with the exception of the Alpha) the
typedef for the new `nclong' datatype is
typedef long nclong;
and existing netCDF code will run without modification. On the Alpha,
however, the `nclong' typedef is
typedef int nclong;
We realize that changes to the interface specification are a hassle.
The alternative in this case is to burden the Alpha platform (and any
future system which makes similar design decisions) with greater memory
usage and poor FORTRAN performance.
> Life is easier on my end if I can just modify the library
> to use nclong. If it is something that the Unidata group
> has agreed should / will happen eventually it puts me on
> a much stronger footing.
Yes, you should be able to just use nclong, and realize that a recompilation
on alphas will be needed later when the typedef is changed. By the way,
please don't use the other type names like "ncbyte" that were mistakenly
included in the netcdf 2.3.2 release, because the C interface and C++
interface currently disagree on what an "ncbyte" is; the C typedef has to be
removed from netcdf.h to even compile the C++ stuff.
> BTW, I managed to change the 'brows-o-rama' in Mosaic 2.1
> to be 'scientific data brows-o-rama' but I'm not sure the
> best way to textually display the dimension names. Things
> are confused by the facts that 1) I think we should still keep
> the dimension sizes even with dimension names and 2) HDF (and
> CDF when I finally add that it) allow unnamed dimensions. Any
> thoughts on how to do this in an aesthetically pleasing way
> would be appreciated.
I agree that the dimension sizes are useful even with the dimension names.
When two dimensions have the same size, using the names is clearer. One way
to include the dimension names would just be to repeat them every time they
are used, as in
Dataset Z has rank 4 with dimensions [frtime=9, level=1, lat=73, lon=145]
instead of
Dataset Z has rank 4 with dimensions [9, 1, 73, 145]
but continue to use the latter form when no dimension names are available.
Alternatively, when dimension names are available you could include an
optional section defining them and just use dimension names after that, as
in:
Dimensions :
There are 5 dimensions with the following names and current sizes:
lat: 73
lon: 145
frtime: UNLIMITED, currently 9
level: 1
timelen: 20
Available datasets :
Dataset Z has rank 4 with dimensions [frtime, level, lat, lon]
When no dimensions are available the dimension section would not appear and
sizes would be used instead of names.
I think the first of these alternatives would be easier to implement, but
result in somewhat less desirable results. I'm not sure of a good way to
represent whether a dimension is UNLIMITED this way, for example. The
second method conveys all the information more compactly, but might give
unimportant information undeserved prominence at the beginning of the
brows-o-rama if there were lots of rarely used dimensions, e.g. for string
lengths.
--Russ