On 3/8/2013 11:02 AM, David Hassell
wrote:
Dear John, I have been thinking about getting cf-python (http://cfpython.bitbucket.org/) to be able to write out NcML aggregation files and wondered if you had any advice. Is there any documentation relating to how properties such as flag_values, ancillary_variables, etc. are dealt with? On a related note, I have written up a framework for storing datasets created by the CF aggregation rules (https://cf-pcmdi.llnl.gov/trac/ticket/78), both in memory and in a (netCDF) file. I would be most interested in your opinion of this. The (very short!) abstract and introduction are at: http://www.met.reading.ac.uk/~david/nca/0.2.2/build/ http://www.met.reading.ac.uk/~david/nca/0.2.2/build/introduction.html In particular, in the introduction I mention ways in which I think it is more general than NcML - but I wonder if what I say is correct ...? Many thanks, and all the best, David -- David Hassell National Centre for Atmospheric Science (NCAS) Department of Meteorology, University of Reading, Earley Gate, PO Box 243, Reading RG6 6BB, U.K. Tel : 0118 3785613 Fax : 0118 3788316 E-mail: address@hidden Hi David: NcML Aggregation is a syntatic aggrgation, with almost no understanding of the meaning of the constructs. http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html has a paragraph at the end of each section: A Union dataset is constructed by transferring objects (dimensions, attributes, groups, and variables) from the nested datasets in the order the nested datasets are listed. If an object with the same name already exists, it is skipped. You need to pay close attention to dimensions and coordinate variables, which must match exactly across nested files. A JoinExisting dataset is constructed by transferring objects (dimensions, attributes, groups, and variables) from the nested datasets in the order the nested datasets are listed. All variables that use the aggregation dimension as their outer dimension are logically concatenated, in the order of the nested datasets. Variables that don't use the aggregation dimension are treated as in a Union dataset, i.e. skipped if one with that name already exists. A JoinNew dataset is constructed by transferring objects (dimensions, attributes, groups, and variables) from the nested datasets in the order the nested datasets are listed. All variables that are listed as aggregation variables are logically concatenated along the new dimension, in the order of the nested datasets. A coordinate Variable is created for the new dimension. Non-aggregation variables are treated as in a Union dataset, i.e. skipped if one of that name already exists. "Feature Collections" are intended to be the successor to Aggregation. These are semantically aware collections. Much ongoing work in the CDM. I need to start writing more docs on this, but heres a start: http://www.unidata.ucar.edu/software/netcdf-java/reference/FeatureDatasets/Overview.html http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.3/reference/collections/FeatureCollections.html ----- For sure, your proposed "CF aggregation" is more general than NcML. I have found it necessary to work at an object model level, esp one that includes general coordinate systems. Your CF data model : http://www.met.rdg.ac.uk/~jonathan/CF_metadata/cfdm.html is roughly equivalent to the CDM data model : http://www.unidata.ucar.edu/software/netcdf-java/CDM/index.html However I havent analyzed it in detail. The CF data model and your proposal http://www.met.reading.ac.uk/~david/cf_aggregation_rules.html are closely tied to the CF encoding. In a way thats good (to get specific about CF meanings) but you also run the risk of getting lost in the details. Your proposal strikes me as midway between the syntactic approach in NcML and the semantic approach in Feature Collections. The syntactic approach is very useful but limited. I think you cant get everything you want with it. Im not sure, for example, if it can handle discrete geometry encodings. They can be a bit devilish, especially when performance and large collections of files are in the mix. CDM handles many files formats and conventions, so is necessarily more abstract, which sometimes means vague. So general feedback is 1) keep separating the data model from the file encoding. 2) implementing is necessary to see what real world cases get covered and what dont. Good luck! John |