[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDFJava #VKJ-807633]: NcML _Netcdf4Dimid, scalar shape compliance, and string separators
- Subject: [netCDFJava #VKJ-807633]: NcML _Netcdf4Dimid, scalar shape compliance, and string separators
- Date: Fri, 22 Nov 2013 09:56:19 -0700
> Hi John,
>
> Thanks for the feedback. Answers, comments, questions interspersed.
>
> > I think "in spirit" probably means that netcdf-java / CDM library
> > does the right thing. Here, schema validation is only the first
> > layer of that; CF compliance being the next layer and has nothing
> > to do with the XML schema.
>
> As you know, files that exploit groups are not CF-compliant, so I
> raise my eyebrows when you include CF-Compliance in "in spirit" :)
yeah, "CF compliance" is a mixed blessing.
>
> > _Netcdf4Dimid is a "real" attribute in netcdf-4 files, apparently meaning:
>
> > // on dimension scales, holds a scalar H5T_NATIVE_INT which is the
> > (zero-based) dimension ID for this dimension. used to maintain
> > creation order
>
> > Its a kludge for using hdf5; im leaving those attribute in in case
> > the user cares about creation order. netcdf C probably removes
> > them, since it puts the dimensions in creation order.
>
> Implementation of this is still unclear to me since the toolsui
> (which again is what I use as the reference for NcML compliance) only
> outputs _Netcdf4Dimid for at most one dimension of multi-dimensional
> variables, i.e., seems broken since if _Netcdf4Dimid is needed for one
> dimension in a variable, it should be needed by all dimensions.
I think that the _Netcdf4Dimid attribute is actually in the file. ToolsUI
doesnt do anything with it, we just show it, like all the other attributes in
the file (there is an argument for suppressing it, which the C library does).
If I were you, I would ignore whether it exists or not, and what its content is.
>
> >> Should I add _Netcdf4Dimid elements to NCO NcML output?
> >> If so, is there a rule for which dimension to add that element for
> >> in the case of multi-dimensional variables?
>
> > No you should not. I assume the problem is that you are comparing
> > the output of java and C?
>
> I compare output of ncks to output of toolsui. Period.
> Not XML-savvy enough to know better.
> Is there a better way to "compliance-check" NcML?
> Please let me know. toolsui happily reads, without complaint, the NcML
> I sent you even though you found a problem (string_arr) with it.
Well you can test XML schema compliance using various tools (maybe google "free
XML schema validator").
>
> >> In any case, I attach a sample input file and its NcML output
> >> generated by ncks in case you have the time and inclination to check
> >> whether the NcML is truly standards-compliant in a way that only a
> >> human can. Also wondering whether NcML really wants shape="" elements
> >> for scalar variables, which would seem redundant, yet I will go by
> >> your recommendation.
>
> > shape is not technically required, but the code i think needs
> > it. One could say if not specified, assume scalar. For now,
> > safer to leave it in.
>
> What "code" do you refer to? I only check NcML with toolsui, and
> nothing breaks when I omit shape="" for scalars. What use-cases break
> without shape=""? I dislike XML's verbosity (bandwidth-consumption) so
> I eschew unneccessary XML-attributes. But practicality overrides that.
By code I mean the netcdf java library.
NcML has 2 modes : as a standalone definition, and as a wrapper for another
file, when the "location" attribute is used to point to the "wrapped file".
In standalone, shape="" is required. In wrapper mode, you only have to include
the changes, so nothing is required, the info from the wrapped file is used.
A little primer is here:
http://www.unidata.ucar.edu/software/thredds/v4.4/netcdf-java/ncml/Tutorial.html
>
> >> Also, I rather randomly picked a separator = "*|*" for strings, in
> >> order to avoid generating NcML with ambiguous whitespace separators
> >> for arrays of strings. If there is a preferred string separator,
> >> please let me know.
>
> > I use "," for readability.
> > but it needs to be something that is not already in one of the
> > strings. To be sure, you should scan the strings first. Otherwise
> > "*|*" is as good as anything.
>
> Yes, so often strings have commas that I picked a different separator
> "*|*" which likely works for all strings in a file.
> ncks has a separate separator for numeric types (" ") by default,
> and easily set to ", ", to increase numeric legibility.
>
> > BTW, in your example, reading in_grp.ncml is barfing because
> > g11/string_var is a scalar in the original file, but because
> > there are embedded blanks, and blank is the default seperator,
> > it sees 33 values. So you need the separator.
>
> Thanks for catching that. Now fixed. New in_grp.ncml attached.
> Checking separators on scalar variables seems unnecessary (why does't
> toolui complain about scalar_var?) yet the compliance-checking method
> you employ does so. What method do you use?
> How do you "read" in_grp.ncml so that it barfs on string_var?
If you try to open in_grp.ncml in the Viewer (first tab on left), it should
barf. If not then you have an old version of ToolsUI.
>
> > Thanks for your test file, im checking to see what issues it comes
> > up with (just trying to open the NcML in ToolsUI/viewer).
>
> > for example, CDM doesnt actually support unsigned longs. we just
> > pretend they are signed. ill think about a workaround for Ncml
> > reading. Ill let you know if i see anything else.
>
> Yes. To indicate unsigned variable i used the toolsui "convention"
> of adding a _FillValue=-1 or -2 depending on type. Of course this
> fails when the variable has its own pre-defined _FillValue, in which
> case ncks uses that (and thus loses any indication that original
> variable was unsigned). Seems like current NcML implementation of
> unsigned types will lose information whenever original variable
> defines _FillValue.
You dont need to use _FillValue, just put _Unsigned="true" (which you did).
Ive added some new code to version 4.4 to deal with unsigned longs.
ill have a look at your file when i get a chance...
John
Ticket Details
===================
Ticket ID: VKJ-807633
Department: Support netCDF Java
Priority: Normal
Status: Open