This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi John-After thinking about this for a couple of days, I believe keeping the human readable names (first table - with slight mods) is much preferable and backward compatible. I understand your reasons for wanting to change, but while that makes the programmer's life easier, it makes the user's (and other programmers') life harder.
For example, from a user perspective, with your changes, I'm going to have to modify 50 or more bundles that are on my local machines (including the NOAA viz wall) or stored on RAMADDA servers. I'm also going to have to modify the customizations to my IDV parameter tables that I've made over the past 7 years.
From a programmer's perspective, here are the impacts of your changes to the IDV:
- bundles which use the variable name for lookup - data aliases used for derived quantities- parameter aliases used for automatically assigning color tables, contour intervals and units - User guide and workshop documentation and examples will need to be updated
For the past 7 or so years, IDV users have been able to access realtime GRIB datasets and have had stability in using and interchanging those datasets. For example, I have a bundle:
http://motherlode.ucar.edu/repository/entry/get/GFS%2080%20km.xidv?entryid=9f77ca66-2264-4f8b-a460-e02fb42606eawhich has displays of 500 hPa geopotential heights, sea level pressure and precipition from the GFS 80km data. These are simple, commonly used parameters. The IDV has a DataAlias table that equates the variable name Geopotential_height with a canonical name of HGT which is used to present derived quantities to the user of thickness and geostrophic wind. It also uses this name to assign a color table, unit and contour levels for any display created for the variable Geopotential height. Same idea goes for Pressure_reduced_to_MSL and Total_precipitation. It doesn't matter whether I go to the GFS 80 km (grib1) or the GFS 1 degree global (grib2), or even a NAM 80km dataset. I can apply the bundle and use the same information to get the same type of display.
Under the scheme in table 1, Geopotential_height will change to Geopotential_height_Pressure, Pressure_reduced_to_MSL will change to Pressure_reduced_to_MSL_Msl and Total_precipitation will change to one of:
Total_precipitation_Surface_12_Hour_Accumulation Total_precipitation_Surface_1_Hour_Accumulation Total_precipitation_Surface_3_Hour_Accumulation Total_precipitation_Surface_6_Hour_Accumulation Total_precipitation_Surface_Mixed_intervals_AccumulationFrom the IDV perspective, the DataAlias and ParameterDefaults use patterns and case insensitivity, so this should not be a problem because the old names would match into the new names. For the bundles, this will be problem, but one that can be dealt with on the IDV or netCDF-Java side with a paramater lookup as discussed at the recent IDV Developers teleconference and which is outlined from the IDV perspective here:
https://mcidasv.ssec.wisc.edu/issues/11 With the new naming: VAR_%d-%d-%d[_error][_L%d][_layer][_I%s_S%d][_D%d][_Prob_%s]The three variables would have different names depending on whether they came from a grib1 or grib2 dataset. This would require Yuan and Julien to redo all the alias and parameter default tables and require a more complicated lookup just to find the 500 hPa geopotential height, sea level pressure and total_precipitation field depending on the dataset used. I think providing consistency between grib1 and grib2 datasets at the very least is an important consideration - in the end, it's all GRIB. GEMPAK and McIDAS (as well as wgrib2 and NCL) create the same names for their variables independent of whether they came from Grib1 or 2.
I fully support the notion of adding in the level information to the variable name as is the case for Geopotential_height. I know for variables like Temperature the current 4.2 scheme can provide different results depending on whether your grib files had a mixture of 2D and 3D varaibles (Temperature = the one on pressure levels) or just 2D variables (Temperature = the one on height above ground level). I know we had this same discussion 5 or so years ago and you kept things consistent, but I understand the problems it creates on both the netCDF-Java/TDS side and sometimes the IDV side (e.g. creating derived quantities) and think that this change can be handled pretty well on the IDV side.
However, I am concerned that some of the name changes are cosmetic and not necessary to achieve your desired goal of adding more uniqueness. For example, in the new scheme, you've up-cased the first letter of the pressure level. For example:
OLD Total_cloud_cover_entire_atmosphere NEW Total_cloud_cover_Entire_atmosphere OLD U-component_of_wind_height_above_ground NEW u-component_of_wind_Height_above_groundWhat does that change to the variable name buy you while not providing backward compatibility? (also why is u-component now a lower case u, but Total_cloud_cover isn't a lower case t?)
Also for some of the probability names like: OLD U-component_of_wind_height_above_ground_stdDev NEW u-component_of_wind_Height_above_ground_Standard_deviationchanging stdDev to Standard_deviation doesn't add a whole lot, but again doesn't provide backward compatibility. It seems like the expansion of the name and the upcasing can be done in the description without breaking existing code in the IDV or other software that uses the netCDF-Java library to read GRIB data.
I support adding the accumulation interval for parameters like Total_precipitation above because now some variables have a mixture of the different types of intervals.
One of your arguments is that over time, names change and it's difficult to maintain tables. While that may be true for lesser variables, I would suggest that the most used variable names rarely change (Temperature, geopotential height, relative humidity, u and v wind components, etc). Unidata has always been in the business of maintaining tables and that's part of the job it does to support the user community. Chiz took great pains to keep station tables up to date, and to some degree, Robb did that with grib tables, and I've done it with IDV tables. While it's not easy, it is a necessary function of the services that Unidata provides. And, changing the names just pushes the work off to others at Unidata. Perhaps Unidata could look at having common tables used by all it's software for consistency. Or perhaps Unidata could work with the NCL group and use their lookup tables?
In the end, I would like to see the netCDF-Java library evolve to suit the needs of the data providers, while also maintaining as much backward compatibility for the end users and software developers who rely on it. You've already gotten at least one support request from the name change and I've answered one question on the netcdf-java list about this change and I think this change will impact a lot of users/programmers. I think a lot of the ancillary information can be provided through variable attributes as it is in 4.2 (description, table number, Discipline/Category/Parmeter, GRIB GDS/PDS information).
BTW, I've cc'd Tom Whittaker on this discussion since he is chair of the User's Committee and the impact of this change will affect the users including some of the more active UserComm/PolComm IDV users like Jim Steenburgh, Dave Dempsey and Kevin Tyle.
Don On 2/15/12 5:25 PM, John Caron wrote:
On 2/15/2012 12:41 PM, Don Murray wrote:John- Thanks for taking the time to generate these lists. Comments are below: On 2/15/12 11:14 AM, John Caron wrote:bundles referencing motherlode could only do so using the "latest" resolver, and im not sure how extensive that is, but we could look at the motherlode logs. seems like local GRIB files are where the pain will be.Jim Steenburg and I (and perhaps others) use the Best Time Series for several datasets and I think as the IDV evolves its time matching capabilities, the Best Time Series will become used more. For the Best Time Series, we use index offsets to get the latest data. Dave Dempsey also uses the forecast 0 data from the Constant Offset FMRC.The FMRC will be replaced by the feature collections. FMRC is particularly broken for GRIB files; it is essentially unmaintainable. FMRC should be ok for netcdf files, as long as they are homogenous. The GRIB feature collection should do the right thing for arbitrary collections of GRIB files. The only constraint (I think) is they have to come from the same center and subcenter. Grib Feature collections have a "collection dataset" that is essentially the same as the Best Time Series. There are probably cases where it is correct where FMRC is not, but im not positive. The other variations, Constant Offset Constant Forecast and Runs will not be supported, at least for now. So Ill have to understand how Dave is using "forecast 0 data from the Constant Offset FMRC" to see what can be done there. The individual files on motherlode will be equivalent to the Runs, since we put all of one run in a single file. For the large datasets on motherlode, there will also be daily collections. This two-level collection (aka "Time partitions") should scale to very large datasets. We dont really need it on motherlode, but it makes the indexing better, plus we want to eat that dogfood. BTW, index offsets are pretty unstable, as these datasets are changing underneath you. Im hoping to prototype a "coordinate space only" Grid feature type using cdmremote, if i every get out of the GRIB swamp. It will solve this problem: http://www.unidata.ucar.edu/blogs/developer/en/entry/indexed_data_access_and_coordinatewe can try to help the user choose new variable names, but the original bundle has to be able to change, or else its going to be useless eventually. Might as well do the right thing now, and add some UI to help evolve the bundle.There will need to be enough information stored in the variables so that if the names change, a request can still find the appropriate data. For example, if I have a bundle that is accessing temperature on a pressure level, what can I use to always get that variable in the dataset if the variable name changes?Really, this is just a problem for GRIB files, which sucks so bad the very fabric of spacetime is warped ;^( I'm certainly willing to add any metadata that's appropriate. I doubt its possible to guarentee that one can " always get that variable in the dataset if the variable name changes", but probably can get pretty close. OTOH, the machinations to do so might not be worth it. Actually, the "NCL naming scheme" below is probably the best bet in this regard.Also, how will you handle the FMRC stuff where older files were indexed with the old names. Does 4.3 still read the old index files?Im not sure what you mean by the index files here? Do you mean gbx8? Or the cached xml files? In both cases, those are no longer used, and new indexed files (gbx9 and ncx) are created.I think this is a moot point so ignore for now. It sounds like you are going to reindex everything for the 4.3.One thing that would help is to generate a list of 4.2 variable names with the corresponding 4.3 names for all the GRIB datasets on motherlode. That could be used by the IDV for the lookup table.unfortunately, the problem with using the human names is that they keep getting tweaked (because the tables keep getting tweaked) by WMO and especially NCEP. So they will just break again whenever that happens. Im leaning towards an NCL-like variable name that will be much more stable (though not guaranteed if we discover we are doing things wrong). The implication is that an application will want to use the description when letting users choose from a list, and the variable name when talking to the API. I think IDV is already doing this? The NCL-like syntax (still evolving) is: VAR_%d-%d-%d[_error][_L%d][_layer][_I%s_S%d][_D%d][_Prob_%s] L = level type S = stat type D = derived typeI did a quick scan and am not keen on the NCL like names. I would like to see the VAR part replaced with a string that describes the variable in some way like NCL does. I've been working with a lot of NCEP folks who swear by wgrib2 which uses the names of the variables in the last column of Table 4.2 for each parameter. These are the names that NCL uses as well.The VARs that NCL use are hand-maintained (by the NCL group) tables. These "short names" are mostly only available from NCEP. They are not in the WMO tables. They are also subject to being tweaked, and are not always unique. So they seem only marginally better than the actual table descriptions. I think using the description for human consumption is the right way to go. Then let the variable names be as stable as possible, but not particularly human readable.I assume that the %d-%d-%d is the discipline, category and parameter info? For GRIB1, the discipline and category will not exist.Yes. GRIB1 only has parameter number. We will also need the table version, possibly center/subcenter. So we will have a different syntax for GRIB1, which i havent done yet.Will you still provide these as descriptive names in the attributes? The IDV uses them to categorize the variables in the Field Chooser. Could you send along a comparison of a couple of variables with the attributes listed so we can see how that has changed?yes, i can add those.> So its differrent from NCL in not using the time coordinate in the name. You are using the time interval in the name for accumulations, so I'm not sure what you mean here.We arent using any of the time interval coordinates, just the interval lengths, in the variable name. Often, one has both a "time instant" and a "time interval" variable in the same file. The interval length is in the data, so not subject to getting changed in a table. Ive prototyped making seperate variables for each time interval length. I think thats probably wrong, but handling mixed intervals is tricky. Probably the IDV needs to carefully consider this issue.Im attaching two lists, both are maps from old names to new names. the first list uses a "human readable" name constructed from the latest GRIB tables, the second uses the NCL-like syntax. (neither are complete or authoritative yet, and are only for GRIB-2). Ive included the grid description on the second list, so the mappings make more sense. one thing to note is that the mapping is dataset dependent 15-20% of the time.Could you give examples of where these differ? I'd like to understand if this is just in accumulation intervals or something more.its mostly where the 4.2 name leaves off the level or other info, to make a nice name. This is now Considered Harmful. in the tables i sent, you can see when there are multiple NEW names. I can send the full report if you want to see in which dataset those live.I will probably release the next CDM version using the NCL-syntax variable names in order to get feedback from the broader community.Perhaps you could send this proposal with the examples out to the netCDF-Java list before sending out the code. People are already changing their code to test out the beta release and if it changes with the next beta, they'll have to do it again.Sounds reasonable. I better get a warning out about how these names are in flux.Thanks again. Donthanks for your feedback. John
-- Don Murray NOAA/ESRL/PSD and CIRES 303-497-3596 http://www.esrl.noaa.gov/psd/people/don.murray/