[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [netcdf-java] Announcing Netcdf-Java / CDM version 4.3.8 BETA and emergency shutdown procedures

Subject: Re: [netcdf-java] Announcing Netcdf-Java / CDM version 4.3.8 BETA and emergency shutdown procedures
Date: Wed, 15 Feb 2012 17:25:19 -0700

On 2/15/2012 12:41 PM, Don Murray wrote:

John-

Thanks for taking the time to generate these lists. Comments are below:

On 2/15/12 11:14 AM, John Caron wrote:
bundles referencing motherlode could only do so using the "latest"
resolver, and im not sure how extensive that is, but we could look at
the motherlode logs. seems like local GRIB files are where the pain will
be.
Jim Steenburg and I (and perhaps others) use the Best Time Series forseveral datasets and I think as the IDV evolves its time matchingcapabilities, the Best Time Series will become used more. For theBest Time Series, we use index offsets to get the latest data. DaveDempsey also uses the forecast 0 data from the Constant Offset FMRC.

The FMRC will be replaced by the feature collections. FMRC isparticularly broken for GRIB files; it is essentially unmaintainable.FMRC should be ok for netcdf files, as long as they are homogenous. TheGRIB feature collection should do the right thing for arbitrarycollections of GRIB files. The only constraint (I think) is they have tocome from the same center and subcenter.

Grib Feature collections have a "collection dataset" that is essentiallythe same as the Best Time Series. There are probably cases where it iscorrect where FMRC is not, but im not positive.

The other variations, Constant Offset Constant Forecast and Runs willnot be supported, at least for now. So Ill have to understand how Daveis using "forecast 0 data from the Constant Offset FMRC" to see what canbe done there. The individual files on motherlode will be equivalent tothe Runs, since we put all of one run in a single file.

For the large datasets on motherlode, there will also be dailycollections. This two-level collection (aka "Time partitions") shouldscale to very large datasets. We dont really need it on motherlode, butit makes the indexing better, plus we want to eat that dogfood.

BTW, index offsets are pretty unstable, as these datasets are changingunderneath you. Im hoping to prototype a "coordinate space only" Gridfeature type using cdmremote, if i every get out of the GRIB swamp. Itwill solve this problem:


http://www.unidata.ucar.edu/blogs/developer/en/entry/indexed_data_access_and_coordinate

we can try to help the user choose new variable names, but the original
bundle has to be able to change, or else its going to be useless
eventually. Might as well do the right thing now, and add some UI to
help evolve the bundle.
There will need to be enough information stored in the variables sothat if the names change, a request can still find the appropriatedata. For example, if I have a bundle that is accessing temperatureon a pressure level, what can I use to always get that variable in thedataset if the variable name changes?

Really, this is just a problem for GRIB files, which sucks so bad thevery fabric of spacetime is warped ;^( I'm certainly willing to add anymetadata that's appropriate. I doubt its possible to guarentee that onecan " always get that variable in the dataset if the variable namechanges", but probably can get pretty close. OTOH, the machinations todo so might not be worth it. Actually, the "NCL naming scheme" below isprobably the best bet in this regard.


Also, how will you handle the FMRC stuff where older files were
indexed with the old names. Does 4.3 still read the old index files?


Im not sure what you mean by the index files here? Do you mean gbx8? Or
the cached xml files? In both cases, those are no longer used, and new
indexed files (gbx9 and ncx) are created.

I think this is a moot point so ignore for now. It sounds like youare going to reindex everything for the 4.3.

One thing that would help is to generate a list of 4.2 variable names
with the corresponding 4.3 names for all the GRIB datasets on
motherlode. That could be used by the IDV for the lookup table.


unfortunately, the problem with using the human names is that they keep
getting tweaked (because the tables keep getting tweaked) by WMO and
especially NCEP. So they will just break again whenever that happens. Im
leaning towards an NCL-like variable name that will be much more stable
(though not guaranteed if we discover we are doing things wrong). The
implication is that an application will want to use the description when
letting users choose from a list, and the variable name when talking to
the API. I think IDV is already doing this?

The NCL-like syntax (still evolving) is:

VAR_%d-%d-%d[_error][_L%d][_layer][_I%s_S%d][_D%d][_Prob_%s]
L = level type
S = stat type
D = derived type

I did a quick scan and am not keen on the NCL like names. I wouldlike to see the VAR part replaced with a string that describes thevariable in some way like NCL does. I've been working with a lot ofNCEP folks who swear by wgrib2 which uses the names of the variablesin the last column of Table 4.2 for each parameter. These are thenames that NCL uses as well.

The VARs that NCL use are hand-maintained (by the NCL group) tables.These "short names" are mostly only available from NCEP. They are not inthe WMO tables. They are also subject to being tweaked, and are notalways unique. So they seem only marginally better than the actual tabledescriptions.

I think using the description for human consumption is the right way togo. Then let the variable names be as stable as possible, but notparticularly human readable.

I assume that the %d-%d-%d is the discipline, category and parameterinfo? For GRIB1, the discipline and category will not exist.

Yes. GRIB1 only has parameter number. We will also need the tableversion, possibly center/subcenter. So we will have a different syntaxfor GRIB1, which i havent done yet.

Will you still provide these as descriptive names in the attributes?The IDV uses them to categorize the variables in the Field Chooser.Could you send along a comparison of a couple of variables with theattributes listed so we can see how that has changed?


yes, i can add those.

> So its differrent from NCL in not using the time coordinate in thename.
You are using the time interval in the name for accumulations, so I'mnot sure what you mean here.

We arent using any of the time interval coordinates, just the intervallengths, in the variable name. Often, one has both a "time instant" anda "time interval" variable in the same file. The interval length is inthe data, so not subject to getting changed in a table.

Ive prototyped making seperate variables for each time interval length.I think thats probably wrong, but handling mixed intervals is tricky.Probably the IDV needs to carefully consider this issue.

Im attaching two lists, both are maps from old names to new names. the
first list uses a "human readable" name constructed from the latest GRIB
tables, the second uses the NCL-like syntax. (neither are complete or
authoritative yet, and are only for GRIB-2). Ive included the grid
description on the second list, so the mappings make more sense.

one thing to note is that the mapping is dataset dependent 15-20% of the
time.

Could you give examples of where these differ? I'd like to understandif this is just in accumulation intervals or something more.

its mostly where the 4.2 name leaves off the level or other info, tomake a nice name. This is now Considered Harmful.

in the tables i sent, you can see when there are multiple NEW names. Ican send the full report if you want to see in which dataset those live.

I will probably release the next CDM version using the NCL-syntax
variable names in order to get feedback from the broader community.
Perhaps you could send this proposal with the examples out to thenetCDF-Java list before sending out the code. People are alreadychanging their code to test out the beta release and if it changeswith the next beta, they'll have to do it again.

Sounds reasonable. I better get a warning out about how these names arein flux.


Thanks again.

Don


thanks for your feedback.

John

References:
- Re: [netcdf-java] Announcing Netcdf-Java / CDM version 4.3.8 BETA and emergency shutdown procedures
  - From: John Caron
- Re: [netcdf-java] Announcing Netcdf-Java / CDM version 4.3.8 BETA and emergency shutdown procedures
  - From: John Caron

Prev by Date: Re: [netcdf-java] GRIB collection processing in 4.3
Next by Date: Re: [netcdf-java] GRIB collection processing in 4.3
Previous by thread: Re: [netcdf-java] Announcing Netcdf-Java / CDM version 4.3.8 BETA and emergency shutdown procedures
Next by thread: [netCDFJava #KDU-752210]: NetcdfDataset converts whole-world grib record to all longitude 180
Index(es):
- Date
- Thread