This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
> Keywords: 199604222201.AA01657 > From: address@hidden (John Sheldon) > Subject: Re: missing_values > To: address@hidden (Steve Emmerson) > Date: Mon, 22 Apr 1996 18:03:13 -0400 (EDT) Hi John, Thanks for sending back all the requested info on the C90 netcdf problem. > What is the "default" value for 'missing_value'? I see one for the > _FillValue...is it considered the same? There is no default for "missing_value", since it's only an (older) convention for an attribute name, and not treated in any special way by the library. With version 2, the "_FillValue" attribute was introduced to permit user specification of the value you get when you read a value that hasn't been written yet. Now with 2.4.1, the ncdump/ncgen utilities also treat "_FillValue" specially, using "_" as a CDL representation for data with that value. Here's what the new User's Guide has to say about these in the section on Attribute Conventions: _FillValue The _FillValue attribute specifies the fill value used to pre-fill disk space allocated to the variable. Such pre-fill occurs unless nofill mode is set using ncsetfill (or NCSFIL for FORTRAN). See section Set Fill Mode for Writes: ncsetfill and NCSFIL, for details. The fill value is returned when reading values that were never written. If _FillValue is defined then it should be scalar and of the same type as the variable. The _FillValue is typically outside the valid range and therefore treated by generic applications as missing. However it is legal for _FillValue to be within the valid range so that it is treated like any other valid value. It is not necessary to define your own _FillValue attribute for a variable if the default fill value for the type of the variable is adequate. However, use of the default fill value for data type byte is not recommended. Note that if you change the value of this attribute, the changed value applies only to subsequent writes; previously written data are not changed. See section Fill Values, for more information. missing_value This attribute is not treated in any special way by the library or conforming generic applications, but is often useful documentation and may be used by specific applications. The missing_value attribute can be a scalar or vector containing values indicating missing data. These values should all be outside the valid range so that generic applications will treat them as missing. See section Fill Values, for more information. > > If you could make the files (or smaller versions of them) available for FTP, > > that might help. The ideal would be if you could send us a small program > > that created files that demonstrated the problem, so we could see if we > > could duplicate it on any of our machines or on the Cray YMP we have access > > to. > > OK, I put several things out on our anon-FTP in: > > ftp://ftp.gfdl.gov/pub/jps/netcdf/debug/C90/ > > > The following files are subsets (one variable) of the user's data that > started all this. They were generated on our C90 running Unicos 8: > > "test05_TEMP_only.nc" - created using netCDF 2.3.2 > "test06_TEMP_only.nc" - created using netCDF 2.4.1 I've downloaded all the files in this directory, thanks. The symptoms appear to be a minor difference in the two C90 XDR implementations used. I assume the netCDF 2.3.2 version and 2.4.1 version are not linked with the same XDR library, and that with the 2.4.1 version, you are using something more highly optimized. (This could be tested by calling xdrmem_create() and xdr_float() and then looking at the result of the encoding.) > I *was* able to reproduce this problem on our C90 using a test code I > have that writes out a file containing an eclectic variety of > variables. I modified this code so that the first element of the first > data variable (SST) is a missing_value. I've run it using both 2.3.2 > and 2.4.1 and the .nc files are > > "test_ncsetup-232.nc" > "test_ncsetup-241.nc" These binary files differ, so that means the 2.3.2 and 2.4.1 libraries are producing different XDR representations for the same value. By looking at the values, I can see that the different IEEE floating-point values for the missing_value attribute differ only by one unit in the last place (one bit of the mantissa). It's plausible to me that an optimized XDR library might produce a floating point representation that differed in the least significant bit of the mantissa from that produced by a less highly optimized XDR library, so I don't know if you could convince Cray this is a bug in the optimized XDR. I think the real problem here is the assumption that you can take an arbitrary native 64-bit floating-point number (9.9999994e+34), convert it to a 32-bit IEEE representation, convert it back to native 64 bits again, and expect to get exactly the same 64 bits you started with. It's not possible to represent all 64-bit Cray floating-point values in 32-bit IEEE floats. When we chose the default value for the _FillValue attribute for floats, we chose a number that could be represented exactly in a mantissa with very few bits: #define FILL_FLOAT (9.9692099683868690e+36F) /* near 15 * 2^119 */ so it is possible to convert the FILL_FLOAT to IEEE and back without roundoff or truncation. > I ncdump-ed each of these using both the 2.3.2 ncdump and the 2.4.1 > ncdump - the CDL files are: > > "test_ncsetup-232_ncdump-232.cdl" > "test_ncsetup-232_ncdump-241.cdl" > "test_ncsetup-241_ncdump-232.cdl" > "test_ncsetup-241_ncdump-241.cdl" > > > For both the libraries, note that the variable SST is defined as > > float SST(time, lat, lon_odd) ; > SST:long_name = "sea surface temperature" ; > SST:units = "deg_K" ; > SST:valid_range = 0.f, 3457.f ; > SST:_FillValue = 9.96921e+36f ; > SST:missing_value = 9.9999994e+34f ; > > BUT, the first line of the data listing for SST written using 2.3.2 is > > SST = > 9.999999419e+34, 1012, 1013, 1014, 1015, 1016, 1017, > > while that for the 2.4.1 version is > > SST = > 1.000000041e+35, 1012, 1013, 1014, 1015, 1016, 1017, The digits beyond the 7th are actually just noise here, because IEEE floating point can only represent 7 to 8 significant digits of precision. These values again differ in only one bit as IEEE floats. > ------- > > NOW, ON THE SGI (IRIX 5.3), I ran the same code. The .nc files > produced using the 2.3.2 vs. 2.4.1 libraries are identical to one > another (compared using the UNIX "cmp"). I've put the .nc file and the > .cdl file (both completely in a 2.4.1 context) in: > > ftp://ftp.gfdl.gov/pub/jps/netcdf/debug/sgi/ > > One interesting thing is that a diff b/w the SGI and C90 renditions : > > diff ./sgi/test_ncsetup-241.cdl ./C90/test_ncsetup-241_ncdump-241.cdl > > yields quite a few differences: > > 34c34 > < time:_FillValue = 9.969209968386869e+36 ; > --- > > time:_FillValue = 9.96920996838687e+36 ; > 45c45 > < SST:missing_value = 1.e+35f ; > --- > > SST:missing_value = 9.9999994e+34f ; > 89c89 > < SST_timeavg_T1:_FillValue = 9.969209968386869e+36 ; > --- > > SST_timeavg_T1:_FillValue = 9.96920996838687e+36 ; > 93c93 > < SST_timeavg_T2:_FillValue = 9.969209968386869e+36 ; > --- > > SST_timeavg_T2:_FillValue = 9.96920996838687e+36 ; > 105c105 > < SST_timeavg_ITEMDT:_FillValue = 9.969209968386869e+36 ; > --- > > SST_timeavg_ITEMDT:_FillValue = 9.96920996838687e+36 ; > 113c113 > < SST_timeavg_TIMEOFITEM:_FillValue = 9.969209968386869e+36 ; > --- > > SST_timeavg_TIMEOFITEM:_FillValue = 9.96920996838687e+36 ; > 124c124 > < SST_timeavg_2_T1:_FillValue = 9.969209968386869e+36 ; > --- > > SST_timeavg_2_T1:_FillValue = 9.96920996838687e+36 ; > 128c128 > < SST_timeavg_2_T2:_FillValue = 9.969209968386869e+36 ; > --- > > SST_timeavg_2_T2:_FillValue = 9.96920996838687e+36 ; > 140c140 > < SST_timeavg_2_ITEMDT:_FillValue = 9.969209968386869e+36 ; > --- > > SST_timeavg_2_ITEMDT:_FillValue = 9.96920996838687e+36 ; > 148c148 > < SST_timeavg_2_TIMEOFITEM:_FillValue = 9.969209968386869e+36 ; > --- > > SST_timeavg_2_TIMEOFITEM:_FillValue = 9.96920996838687e+36 ; > 154,155c154,155 > < :missing_value = -1.e-29f ; > < :Global_float_ADDATT = 123.456f ; > --- > > :missing_value = -9.9999993e-30f ; > > :Global_float_ADDATT = 123.45599f ; > 347c347 > < 25071, 25089, 25108, 25127, 25146, 25165, 25184 ; > --- > > 25070, 25089, 25108, 25127, 25146, 25165, 25184 ; All these differences are on the order of the last bit in a 32-bit IEEE float. Also ncdump uses sprintf(3) to print floating-point values, and different platforms may display the same floating-point value slightly differently as an ASCII string. I note that there are really only about 4 or 5 floating-point values that differ by one bit in the SGI and C90 241 versions of the netCDF file. > The very last difference may well be a consequence of the arithmetic in > my own packing routine, but the other differences are a little > unnerving to see when one is looking for true "platform independence". > Are these sorts of "differences" expected? Yes, because the Cray has to do arithmetic to convert from its native floating-point representation to IEEE and back, and thus rounding or truncation may lead to single-bit differences. One way around this is to not test for exact floating-point equality for missing values, but instead use a slightly fuzzy test, as the 2.4.1 ncdump does in testing for floating-point fill values: #define absval(x) ( (x) < 0 ? -(x) : (x) ) if(varp->has_fillval && (absval(*vp.fp - *fillp.fp) <= absval(float_eps * *fillp.fp))) { (void) sprintf(sout, FILL_STRING); } else { (void) sprintf(sout, fmt, *vp.fp); } where float_eps is defined in terms of the machine precision, FLT_EPSILON in float.h. > ----------- > > Now, let me know if you still need me to extract a piece of code that > can reproduce this problem. Since that's always about about as much > fun as a root canal, and since I've already spent the better part of an > afternoon assembling the above info, I won't tackle that unless you > really need me to. No, that's OK. But I hope this question comes up again, so I can justify including some of this in the FAQ :-) ... --Russ ______________________________________________________________________________ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu