Just to make sure i understand your terminology:
files = physical files
YUP
datasets = logical files we want the user to see
I don't think about datasets in a file concept. It could be a group of
files, a single file,... I guess that the reason that I don't think
about it that way is that the data need not be in digital form to be
grouped in a data set. Beach profiles that have been collected over
the past 50 years and consist of pages of numbers - monthly values of
depth below mean low water at specified distances from a marker in a
given direction would qualify. I suppose that your definition is
correct from a computer perspective, I just don't think of it that way.
inventory = listing of datasets
No, a listing of datasets is what I refer to as a directory (not a
directory on a computer). The GCMD is an example of same. An
inventory is a listing of elements in a data set, it could be a
list of times for satellite images in an archive along with the
physical location of the data (tape C18341 on a rack, or
N861230147.hat in a computer directory on my machine) or a list
of times and locations of each XBT in an XBT archive.
question:
what does it mean to "group files into data sets"? like the agg server?
One mightsay that all images in this projection, from this satellite,
processed this way form a data. Or one could say that all images in
this projection, from this suite of satellites processed this way
form a data set. Or... This is the trouble with data sets, different
people call different groupings of the data a data set. This caused
a lot of blood letting between NASA and NOAA a number of years back.
The idea is NOT to call every granule or every file in the system a
data set, you know the difference between lumpers and splitters. In
order for us to make progress, we have to back off a bit and look at
the big picture, grouping things into data sets allows us to do that.
This is exactly the problem that the DODS crawler has. When it crawls
a site such as our satellite archive, it ends up with thousands of
entries and the system or the person viewing the results struggles
with a data overload, more information that s/he/it (humm... have
to be careful with these gender neutral versions) wants or needs to
locate the group of files that define the object of interest. Given
that there is no precise definition for how to group files into a
data set, I think that we can reduce the amount of information that
we have to deal with to a reasonable view of the all the data on the
system without losing much if anything. The crawler is likely to group
the files slightly differently in some cases than the human would, but
one could probably discover this pretty quickly and steer the crawler
if necessary.
Generating "inventories of granules in data sets" makes sense in the context of
an agg server, but is there also meaning to it in the context of a normal DODS
server?
Not sure exactly what you mean here. We have file servers which are
inventories of granules in data sets. Actually the terminology is a
bit loose here also. The server in this case is a DODS FreeForm server.
It serves a table that contains a list of URLs with the characteristic(s)
that differentiate one URI from another, time in the case of our satellite
archives.