[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Meeting about improving the GRD API.
- Subject: Re: Meeting about improving the GRD API.
- Date: Fri, 09 Feb 2007 10:47:25 -0800
Thanks very much for this detailed response John.
Currently at BCS we're having a very interesting discussion about the way
forward, and I've placed a call to Ted (no return call yet) to get his
opinion on a satellite application that would become feasible by
combining the strengths of the various technologies involved. More
later today....
Regards,
Ian
===================
At 08:13 AM 2/9/2007, John Caron wrote:
Hello all, comments are
in-line:
Ian Barrodale wrote:
Hi Ted, John, Russ, and
John:
Thank you all for taking the time yesterday to both listen to our story
and to further enlighten us about your work. It was much
appreciated.
The note below provides a possible implementation route, and some
questions. Please feel free to point out any shortcomings in our
proposed approach, and please provide any answers that come to mind
regarding our questions.
Thanks again,
Ian
=======================
Goal
-------
Based on feedback from BCS Grid DataBlade customers and, in particular,
Ted Habermann, we feel that there may be some value in providing
alternate ways of accessing data from a Grid DataBlade (GRD) -
powered database through existing widely-used protocols and
methods. Note that by "accessing", we really mean
just the reading part, as we already provide, through the BCS Gridded
Data Loader client, a means of conveniently ingesting data from many
forms into a GRD-powered database. One method of accessing the
data would be to cast it in the form of the Common Data Model
(CDM) supported by the Java netCDF API from UCAR. The
advantage of this is that:
* users would be able to write software using the Java
netCDF API
(which is fairly straightforward to use
and well documented) for
accessing GRD data, and
* data providers can use a GRD-powered database and
provide access
to it through OPeNDAP, WCS, netCDF files,
etc. using the Java
netCDF API (see page 53 attachment,
modified from the slide on
page 53 of
http://www.unidata.ucar.edu/staff/caron/presentations/CDM.ppt).
Our understanding of a possible implementation
---------------------------------------------------------------------
To handle GRD data from the Java netCDF API, we would have to:
(i) Create a GRD I/O service provider for the Java netCDF API (see page
38 attachment) that can communicate with the GRD database using a
combination of JDBC and the existing Java GRD API. The Java netCDF
API uses a service provider architecture to handle reading multiple
different file formats and casting them in the form of the CDM.
(ii) Create a GRD content manager to handle the georeferencing
information in the GRD.
One possible method for allowing users to access GRD data without a
full THREDDS catalog is to supply some type of unique URL to the
database:
grd://user:pass@server/database
and the service provider would construct a CDM instance that contains a
main group of all the grids in the database and allow the user to access
those grids through the API.
For example:
grd://peter:address@hidden/coastwatch
might be a reference to a GRD database running at Barrodale that contains
gridded NOAA CoastWatch satellite-derived data for some number of
geographic areas and time periods. The resulting netCDF dataset
would be one that contains a list of grids under a root group like a
directory structure:
/
/sst/
/sst/northeast/
/sst/northeast/jan01_2007 <---- a grid
/sst/northeast/jan02_2007 <---- another
grid
...
/chlorophyll/northeast/jan01_2007 <---- a third
grid
/chlorophyll/northeast/jan02_2007 <---- and so
on
It depends on the desired complexity of the grids in the database as to
whether the user would require a more sophisticated catalog with querying
ability such as that which THREDDS could supply.
see the last answer below.
BTW, the TDS will soon have the ability to do proper HTTP-based
authentication, and we are hoping to make that a standard in OPenDAP
clients, which can act like browsers and pop up a username/password
dialog window, instead of embedding the user:pass@ in the URL.
Questions
---------------
We have the following questions:
1) Where in the netCDF API would the content manager that handles GRD
georeferencing information sit?
2) How does the I/O SP architecture determine the I/O SP for a
given
file://
<file://\> style URL?
How would it know to handle a grd:// URL
differently?
Very perceptive question; let me start here to explain these 2
questions:
The IOSP architecture is, in fact (RandomAccessFile) file based. Since
you will be URL based, we have to fit you in at a higher level, namely
NetcdfDataset.openFile(). If you look there you will see that we look for
opendap (http: or dods:) and thredds: URLs. It might makes sense to
generalize this to allow plugging in external handlers for your protocol,
similar to how java.net.ContentHandler works. Otherwise we might put your
code in the core, which is also a possibility.
Anyway, NetcdfDataset.openFile() would detect your URL scheme and call
NetcdfFile with your IOSP. We will have to add a new constructor for
that. (You could alternately just subclass NetcdfFile, which is what
DODSNetcdfFile does).
As for the "content manager that handles GRD georeferencing
information". It could be a CoordSysBuilder subclass. However, this
is actually unnecessary if you use an existing Convention, and we would
highly recommend using the CF Convention for gridded data. Since you are
creating the "file", you can add the attributes and variables
needed by that Convention. This makes your data "CF compliant"
automatically, which is a real win.
3) Have we interpreted the slide
on page 53 correctly -- is there a server that can serve out data using
the CDM (via the Java netCDF API) as an intermediate
step?
yes, the THREDDS Data Server
4) Does a group structure to
represent GRD contents map to an OPeNDAP connection, WCS, or netCDF file
or do those types of data representations only have netCDF variables and
no groups?
In principle you could use Groups, but they really wont be fully
supported until we get the netcdf-4 file format finished and tested. I
would advise to start with the simpler case of no groups.
5) Our understanding of the
netCDF Java library is that it has, in particular, the following two
entry points:
* NetcdfFile : this is the bare netCDF access to files
of various
types. It doesn't understand anything
about coordinate systems.
You can add an I/O service provider to
handle your favorite file
format via a class method. The variables
it returns are instances
of Variable (which of course don't know
anything about coordinate
systems).
* NetcdfDataset : this is a layer built above the
NetcdfFile layer
and is the usual interface for
applications (e.g., a WCS). It
handles converting various attributes into
a coordinate system. It
has a number of methods relating to adding
or getting coordinate
systems. These methods seem to be applied
to the entire file,
rather than to individual variables (or
groups).
coordinate systems are really variable-specific. however the common case
is that each dataset has a single coordinate system (or a set of closely
related ones).
CoordinateSystem
<
http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/CoordinateSystem.html
>
*findCoordinateSystem*
<
http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#findCoordinateSystem%28java.lang.String%29
>(
java.lang.String name) //
Retrieve the CoordinateSystem with the specified name.
java.util.List
*getCoordinateAxes*
<
http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordinateAxes%28%29>(
)
// Get the list of
all CoordinateAxis objects used by this dataset.
java.util.List *
getCoordinateTransforms *
<
http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordinateTransforms%28%29
> ()
// Get the list of
all CoordinateTransform objects used by this dataset.
boolean *
getCoordSysWereAdded *
<
http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordSysWereAdded%28%29
> ()
// Has Coordinate
System metadata been added.
The NetcdfDataset object contains instances of VariableDS. They are like
a wrapper for the Variable objects found in the NetcdfFile object. There
is a method to ask a VariableDS for the list of coordinate systems
associated with it.
exactly
If we interpret things correctly
, when a NetcdfDataset object is built from a NetcdfFile object, the
NetcdfDataset object is responsible for figuring out the coordinate
system information from attributes in the NetcdfFile, and composing a
VariableDS from the coordinate system information and each Variable. In
theory, by implementing our own CoordSysBuilder class and registering it,
we should be able to add coordinate system information to each VariableDS
individually.
yes, or as i mentioned use an existing Convention and
CoordSysBuilder.
A question then is : do
applications like the web coverage server and OPeNDAP server get their
coordinate information from VariableDS objects or from the NetcdfDataset
object?
OPenDAP is (more or less) at the same level as NetcdfFile, and so just
faithfully transmits Variables, Attributes, and Dimensions across the
wire. The coordinate systems then are added by clients (like CDM) that
understand the convention. We are expecting that DAP4, the future opendap
protocol, will add Groups.
WCS, OTOH, works at the coordinate system level, and so uses the
GridDatatype, which is specialized for "coverage" data, and
gets its coordinates systems from NetcdfDataset. The clent makes requests
in coordinate space, and we know how to translate that into index space.
Currently we can send back either geoTiff or netcdf/CF files. There are
some limittions- the grid spacing must be uniform in WCS 1.0. We expect
to move to WCS 1.1 later this year, which removes that limitation. We
havent implemented reprojection/resampling, and im not sure that we will.
If it is from the NetcdfDataset
object, then the strategy of grouping all the grids in a database into a
single NetcdfDataset, as outline above, won't work, and we'd be obliged
to use a THREDDS server. Is this correct?
It would likely be a mistake to put a lot of disparate data into the same
NetcdfDataset. Better to find the right granularity, which is typically
homogenous data that shares the same discovery metadata. So I would
not use the Group mechanism to break the data into granules, better to
make seperate datasets. Its possible that such an idiom will develop with
Netcdf-4, but better to get something working that stays within existing
practice, then decide if you want to forge ahead. Let me emphasize that
its really important to find the right dataset granularity.
This means you want to use THREDDS catalogs to publish the dataset URLs
and associated metadata, and possibly use TDS to serve your data. Once
you had an IOSP or equivilent for your data, the main work is to develop
the catalogs. These can be pretty minimal, but automatically populating
catalogs with high-quality metadata is a huge win in the long
run.
I think that would be a powerful value-added product, but of course i
dont know what your customers really want. As Ted mentioned, its a good
time to help influence TDS strategy, and it appears to me that your small
company with extensive scientific experience would be a good fit with
Unidata.
John
**********************************************
Ian Barrodale, Ph.D.
President
Barrodale Computing Services Ltd.
Tel: (250) 472-4372 Fax: (250) 472-4373
Web:
http://www.barrodale.com
Email: address@hidden
**********************************************
Mailing Address:
P.O. Box 3075 STN CSC
Victoria BC Canada V8W 3W2
Shipping Address:
Hut R, McKenzie Avenue
University of Victoria
Victoria BC Canada V8W 3W2
**********************************************