[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Seeking .nc advice for sequence data



I am trying to figure out how to store lots of sequence-like data in .nc files for efficient access via OPeNDAP. In particular, I am trying to determine if actual OPeNDAP Sequences (Structures with an unlimited dimension in the .nc file) is not appropriate for our purposes.
Yes, I could store the data in a file on the computer where the program 
needing access is running, and not have to access it via OPeNDAP, so 
that network transmission time would be minimized. But this project is 
partly an experiment in dealing with remotely accessed data. So I am 
trying to design a solution where the data is accessed from another 
computer via OPeNDAP.
Here's an example. Let's say I want to store all NDBC buoy data in a .nc 
file. There are over 100 buoys. For each buoy, there are readings for 
some time period (e.g., just 1989, or from 1990 to the present). The 
readings are an hour apart. Several variables (e.g., WindSpeed and 
WindDirection) are measured at each time point. Since we work with 
real-time data, I plan to update this file frequently (every day, but 
ideally every hour).
The problem is, I need to have *quick* access via OPeNDAP:
* Across all buoys at a specific time point, e.g., What is the wind speed at all buoys at 2004-12-14T09:00Z? * Or, for all time points available, what is the wind speed, for example, at a specific buoy?
Regarding the first requirement, from what I understand, if I use 
sequences, there is no way to get the data for a given time point 
without reading either the whole file up to that time point, or without 
reading a whole variable.  Either of which would seem to take too long 
if I want the values for 100 buoys (given that I am using OPeNDAP to 
connect to a remote computer and want the response quickly for my 
CoastWatch Browser program, which graphs the data for on-line users who 
want a quick response).
Since the time range of available data for each buoy varies greatly, it 
seems grossly wasteful of space to have a common Time dimension for all 
buoys. Doing so would probably force me over the 2GB file size, which is 
generally trouble. So I am thinking about either:
* A time dimension for each buoy (e.g., time14978 for buoy 14978) and a 
several variables which use that dimension to store the data for that 
buoy (e.g., windSpeed14978, windDirection14978, etc.).  This setup would 
be replicated for each buoy.
* Or, a Group for each buoy, again with a time dimension and several 
variables in each group to store the data for each buoy.  (If this is a 
new .nc feature, does OPeNDAP deal with this yet?)
* Or, an ArrayObject.1D of variables, each element of which is an 
ArrayObject.1D of the variables for a given buoy.  (I'm not sure if this 
can be done.)
* Or, an ArrayObject.2D of variables, with buoys as one dimension and 
the various variables (e.g., WindSpeed, WindDirection) on the other 
dimension. (I'm not sure if this can be done.)
I plan to solve the updating problem by leaving rows of missing values 
at the end of the data for each active buoy. As new data comes in, I 
will replace the missing values with actual data. Then, I only have to 
rewrite the file (to add more rows of missing values) once in a while, 
not every time.
Which approach sounds best? Is there another approach?  Do you have any 
advice?
Are sequences the wrong way to go?  Of course, that could change if one 
could efficiently access specific ranges from variables in a 
Sequence/Structure.  But it my understanding that that is not currently 
possible.
Although I gave this specific example, we store a lot of sequence-like 
data where I work.  Whatever .nc file structure is appropriate for the 
buoys will likely be appropriate for much of this other data. So I want 
to get it right.
Thank you.


Sincerely,

Bob Simons
Satellite Data Product Manager
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
(831)658-3205
address@hidden
<>< <>< <>< <>< <>< <>< <>< <>< <><