[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Small THREDDS catalogs and the Proposed new specification for THREDDSS Catalogs



Hi benno:

Benno Blumenthal wrote:

John Caron wrote:

A proposed new version of the THREDDS Dataset Inventory Catalog is ready for your comments. Please send them to address@hidden, or to me.

I am glad to see some expansion of the spec so that we can convey more information about our datasets. I have a question, however, about switching from DTD to XML Schema.


My view of THREDDS DIC in general is that it allows a data provider to describe their collection of data sets at a level of utility not generally available. I understand there is a considerable effort to generate catalogs automatically for existing collections, and that is important, but I think one can provide information in the THREDDS format that cannot be simply expressed by, for example, a collection of typed files in a directory. So if providers have the detailed information about their datasets, it is probably a set of links on an HTML page (extended documentation in HTML or PDF files, perhaps links to metadata files, etc), along with documentation for the collection as a whole. Also, many providers only provide a few datasets. So the path of least resistance for many is to simply write the THREDDS file.

In light of this, I have been encouraging groups to write THREDDS xml files (version 0.6) to describe their collections. Not that I have had a tremendous amount of success so far, but given some more time, I am hopeful of more to show. For example, there is


http://www.ecco-group.org/thredds/sioeccoCatalog.xml

which is displayed in my (Ingrid) interface at

http://iridl.ldeo.columbia.edu/%28http://www.ecco-group.org/thredds/sioeccoCatalog.xml%29readthredds/

or

http://iridl.ldeo.columbia.edu/SOURCES/.SIO/.ECCO/


So given the THREDDS catalog generation options, they chose to write it by hand, and I taught them the nuances of the tags. Pretty reasonable given the small number of datasets and the additional documentation that we wanted to link in. I am a little embarassed that the next THREDDS version is so different from the first, though I guess that is my problem more that anyone elses.

So

1) What are the benefits of changing from DTD to XML Schema?

Well, its not as great as some would claim, and schema validation is just now becoming stable. I cant say im crazy about it. OTOH, writers dont actually have to use it, and readers dont have to validate with it. Schema is certainly more expressive than DTDs.

The integrated namespace handling is one clear win, we are using it for both version control and "foreign" metadata parsing. I think in the long run Schema will be worthwhile, esp as we learn what subsets of it are best for what purposes.

There will be great benefits in switching to the 1.0 spec, as you note above, esp in connecting to Digital Libaries and providing the raw data for search services.

Does it include an automatic editing interface easily accessible to everyone?

yes, we are working on a editing interface to make this easy.


2) Can you provide a conversion utility on the web so that anyone that has written an old-format file can instantly get a new-format version?

yes. Ill try to get a prototype web service running soon so anyone can try it.



If the new format is much easy to edit than the old (because interfaces are readily available), I think the sell for switching can be made. Otherwise, users may be annoyed, i.e. slow to switch and/or adopt.

yup, good points. We have been aware of the backwards compatibility issues. The main problems are around inheritence, which is a big pain in general, but seems awfully useful from a writer's POV. Anyone have any thoughts about how useful inheritence is (eg, putting a dataType or serviceName in a parent dataset element and having it be inherited by its nested datasets) ?