This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi John- On 2/15/12 5:25 PM, John Caron wrote:
Jim Steenburg and I (and perhaps others) use the Best Time Series for several datasets and I think as the IDV evolves its time matching capabilities, the Best Time Series will become used more. For the Best Time Series, we use index offsets to get the latest data. Dave Dempsey also uses the forecast 0 data from the Constant Offset FMRC.The FMRC will be replaced by the feature collections. FMRC is particularly broken for GRIB files; it is essentially unmaintainable. FMRC should be ok for netcdf files, as long as they are homogenous. The GRIB feature collection should do the right thing for arbitrary collections of GRIB files. The only constraint (I think) is they have to come from the same center and subcenter. Grib Feature collections have a "collection dataset" that is essentially the same as the Best Time Series. There are probably cases where it is correct where FMRC is not, but im not positive.
This might be okay in combination with the IDV time driver work. I have a couple use cases that I use the best time series for:
- match the satellite image times in a display giving either the analysis or the forecast in the series. - get the last week of analyses and the next week of forecasts of precipitable water to watch the evolution of atmospheric rivers.
- show the last 12 hours of analyses from the RUC data.
The other variations, Constant Offset Constant Forecast and Runs will not be supported, at least for now. So Ill have to understand how Dave is using "forecast 0 data from the Constant Offset FMRC" to see what can be done there. The individual files on motherlode will be equivalent to the Runs, since we put all of one run in a single file.
I think the use case is to get the analyses for some time period (e.g. the last week). He could do this as I do from the best time series, but then you have to know the right indexes to use so you don't get the forecasts.
For the large datasets on motherlode, there will also be daily collections. This two-level collection (aka "Time partitions") should scale to very large datasets. We dont really need it on motherlode, but it makes the indexing better, plus we want to eat that dogfood. BTW, index offsets are pretty unstable, as these datasets are changing underneath you.
I understand the limitations, but for the use cases Jim and I have it works pretty well.
Im hoping to prototype a "coordinate space only" Grid
feature type using cdmremote, if i every get out of the GRIB swamp. It will solve this problem: http://www.unidata.ucar.edu/blogs/developer/en/entry/indexed_data_access_and_coordinatewe can try to help the user choose new variable names, but the original bundle has to be able to change, or else its going to be useless eventually. Might as well do the right thing now, and add some UI to help evolve the bundle.There will need to be enough information stored in the variables so that if the names change, a request can still find the appropriate data. For example, if I have a bundle that is accessing temperature on a pressure level, what can I use to always get that variable in the dataset if the variable name changes?Really, this is just a problem for GRIB files, which sucks so bad the very fabric of spacetime is warped ;^( I'm certainly willing to add any metadata that's appropriate. I doubt its possible to guarentee that one can " always get that variable in the dataset if the variable name changes", but probably can get pretty close. OTOH, the machinations to do so might not be worth it. Actually, the "NCL naming scheme" below is probably the best bet in this regard.
See the note I sent out this morning on this.
Also, how will you handle the FMRC stuff where older files were indexed with the old names. Does 4.3 still read the old index files?Im not sure what you mean by the index files here? Do you mean gbx8? Or the cached xml files? In both cases, those are no longer used, and new indexed files (gbx9 and ncx) are created.I think this is a moot point so ignore for now. It sounds like you are going to reindex everything for the 4.3.One thing that would help is to generate a list of 4.2 variable names with the corresponding 4.3 names for all the GRIB datasets on motherlode. That could be used by the IDV for the lookup table.unfortunately, the problem with using the human names is that they keep getting tweaked (because the tables keep getting tweaked) by WMO and especially NCEP. So they will just break again whenever that happens. Im leaning towards an NCL-like variable name that will be much more stable (though not guaranteed if we discover we are doing things wrong). The implication is that an application will want to use the description when letting users choose from a list, and the variable name when talking to the API. I think IDV is already doing this? The NCL-like syntax (still evolving) is: VAR_%d-%d-%d[_error][_L%d][_layer][_I%s_S%d][_D%d][_Prob_%s] L = level type S = stat type D = derived typeI did a quick scan and am not keen on the NCL like names. I would like to see the VAR part replaced with a string that describes the variable in some way like NCL does. I've been working with a lot of NCEP folks who swear by wgrib2 which uses the names of the variables in the last column of Table 4.2 for each parameter. These are the names that NCL uses as well.The VARs that NCL use are hand-maintained (by the NCL group) tables. These "short names" are mostly only available from NCEP. They are not in the WMO tables. They are also subject to being tweaked, and are not always unique. So they seem only marginally better than the actual table descriptions.
However, they are well recognized by many users of GRIB2 data at NCEP, NOAA and users of wgrib/wgrib2. Grib1 tables have a short name in the table, and I understand that Grib2 tables do now. NCAR must be maintaining a lookup list - can't you use that?
I think using the description for human consumption is the right way to go. Then let the variable names be as stable as possible, but not particularly human readable.
FYI, the ECMWF grib software (http://www.ecmwf.int/publications/manuals/d/gribapi/keys/)(used in pygrib) allows users to specify aliases for parameters. For example, I can specify temperature as t, or Temperature. Perhaps the netCDF-Java library can have a facility for the user to specify an alias and other information and get back the appropriate variable (e.g. param="t", level="pressure").
Will you still provide these as descriptive names in the attributes? The IDV uses them to categorize the variables in the Field Chooser. Could you send along a comparison of a couple of variables with the attributes listed so we can see how that has changed?yes, i can add those.
Thanks.
> So its differrent from NCL in not using the time coordinate in the name. You are using the time interval in the name for accumulations, so I'm not sure what you mean here.We arent using any of the time interval coordinates, just the interval lengths, in the variable name. Often, one has both a "time instant" and a "time interval" variable in the same file. The interval length is in the data, so not subject to getting changed in a table. Ive prototyped making seperate variables for each time interval length. I think thats probably wrong, but handling mixed intervals is tricky. Probably the IDV needs to carefully consider this issue.
I would agree that's wrong. I think having the mixed_intervals identifier on the name and using the cell_bounds attribute is the way to go. I can provide you with some sample data from the grib2 datasets that I've been working with that illustrates this if you want.
one thing to note is that the mapping is dataset dependent 15-20% of the time.Could you give examples of where these differ? I'd like to understand if this is just in accumulation intervals or something more.its mostly where the 4.2 name leaves off the level or other info, to make a nice name. This is now Considered Harmful.
I agree per the note I sent out this morning. Don -- Don Murray NOAA/ESRL/PSD and CIRES 303-497-3596 http://www.esrl.noaa.gov/psd/people/don.murray/