[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #JJL-124364]: Importing data without endianness conversion
- Subject: [netCDF #JJL-124364]: Importing data without endianness conversion
- Date: Mon, 05 Aug 2013 13:48:48 -0600
Peter,
I've looked at the sample program you sent, and understand that you're
expecting netCDF to interpret the
res = nc_def_var_endian(ncid, varid, NC_ENDIAN_BIG);
...
res = nc_put_var(ncid, varid, data);
function calls to not only define the on-disk representation for floating-
point values for the variable to be big-endian, but you also want the data
in the nc_put_var() call to somehow be interpreted as using big-endian
encoding as well, rather than the native type of an array of char, which
is how data is declared:
/* 123.456 in big-endian. */
const char *data = "\x40\x5e\xdd\x2f\x1a\x9f\xbe\x77";
However, that's not how the nc_put_var() function works. The only documented
uses for nc_put_var() are for writing user-defined data, such as compound
or variable-length types, not for arrays of primitive type data. Ordinarily,
to write a numeric value in a netCDF variable of type NC_DOUBLE, you would
call
res = nc_put_var_TYPE(ncid, varid, data);
where TYPE denotes a primitive numeric type represented in native (in-memory)
form, such as uchar, schar, short, int, long, float, double, ushort, uint,
longlong, or ulonglong. In each case except for "double", a conversion takes
place from the in-memory native type to big-endian double on the disk. There
is no way to indicate that the type of the data to be written is other than a
native numeric type.
HDF5 has a richer type system that includes user-defined primitive types, but
netCDF-4 intentionally doesn't support user-defined primitive types, as
explained here (where they are called "user-defined atomic types"):
http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#fv15
So, I'm sorry to say, you'll have to either convert from big-endian to native
type in memory first, before you try to write the data, or use HDF5 instead of
netCDF-4. Converting from big-endian to little-endian is actually fast in C,
and the netCDF library even contains internal functions to do that conversion.
See the swap8b() function in libsrc/ncx.c ...
--Russ
> On 07/30/2013 10:07 PM, Unidata netCDF Support wrote:
> > Hi Peter,
> >
> > Sorry to have taken so long to respond to your question ...
> >> Is there any way in which dataset can be created from binary big-endian
> >> data on a little-endian host without endianness conversion applied?
> >>
> >> I have data in big-endian, and I would like to import it into a
> >> H5T_IEEE_F64BE dataset as is. Sadly, the function nc_def_var_endian is
> >> not good enough - although it creates a H5T_IEEE_F64BE dataset, the
> >> interpretation of the raw data is still little-endian, and a conversion
> >> is done (leading to incorrect values).
> >
> > Are you reading the data from an HDF5 file, or from a netCDF-4 file? Is
> > the little endian data marked as little endian in the file you are trying
> > to read? That is, if you run
> >
> > ncdump -s -v VAR INPUTFILE
> >
> > where the "-s" is for showing special virtual attributes such as endianness
> > and the "-v VAR" is for looking at a specific variable named VAR in the
> > input
> > file, do you see the attribute
> >
> > VAR:_Endianness = "little" ;
> >
> > where, again, VAR is the name of the variable (HDF5 dataset) you're looking
> > for.
>
> Actually, the data is read from a binary file (originally from a
> big-endian Fortran program). It is stored in a uint8_t *data variable,
> with every 8 bytes coding one 64-bit floating point number, but the
> order of bytes is not matching the host endianness.
>
> >> Both nc_put_var and nc_pur_vara behave the same in this respect.
> >
> > If this is a bug, we'd like to fix it. But we have tests for endianness,
> > and
> > would need a small program that demonstrates this bug, so we could duplicate
> > it here and fix it. Note that setting the endianness for a netCDF variable
> > only affects its representation on disk when writing values. It does not
> > affect the way data is decoded and represented in native types when reading.
> > That is always determined by how HDF5 has labelled the data type, as little-
> > endian or big-endian.
>
> I don't think this is a bug. It's just that the nc_put_* functions
> expect the raw data array to be in native endiannness. I hoped that
> there might be a way I could take the big-endian data and save it in a
> big-endian dataset with no conversion needed (at least not at the time
> of writing).
>
> >> With HDF5 API this is easily achieved by setting the type to
> >> H5T_IEEE_F64BE, but I would prefer to use the netcdf API.
> >
> > --Russ
> >
> > Russ Rew UCAR Unidata Program
> > address@hidden http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: JJL-124364
> > Department: Support netCDF
> > Priority: High
> > Status: Closed
>
> Attached is an example program which demonstrates the problem.
>
> $ # On a little-endian host.
> $ gcc -o ncendian -Wall ncendian.c -lnetcdf
> $ ./ncendian
> $ h5dump -d data ncendian.nc
> HDF5 "ncendian.nc" {
> DATASET "data" {
> DATATYPE H5T_IEEE_F64BE
> DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
> DATA {
> (0): 6.31921e+268
> }
> ATTRIBUTE "DIMENSION_LIST" {
> DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
> DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
> DATA {
> (0): (DATASET 255 /dim1 )
> }
> }
> }
> }
>
> Expected value is 123.456. Even though nc_def_var_endian sets datatype
> to H5T_IEEE_F64BE, the interpretation of the array data is still
> little-endian. When nc_def_var_endian is not used, the output is the
> same (6.31921e+268), only datatype changes to H5T_IEEE_F64LE.
>
> Regards,
>
> Peter
>
>
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: JJL-124364
Department: Support netCDF
Priority: High
Status: Closed