On 4/27/2010 2:50 PM, Steve Hankin wrote:
Hi John,
No surprises here. The "gotta have" that comes to mind is simply:
- if the aggregation server substitutes a new _FillValue, then a
new _FillValue attribute should be created that documents the new value
yes, getting the _FillValue or missing_value documented correctly would
be good. I also have a bug from Kevin I havent yet fixed.
The "it would be nice" (but not essential) is
It would be nice if FMRC was (to the degree feasible)
"just
another aggregation". Stated another way, an FMRC generates a number
of time aggregations in a single ncML configuration step. Each of the
time aggregations that are created should ideally behave very much like
that single time aggregation would have behaved if it had been
hand-configured in ncML. This would imply that the rules governing
scale and offset would be handled the same for FMRC as for other
aggregations.
It should be possible to set the global behavior for
scale/offset/missing. Per-dataset behavior still a maybe.
Because FMRC needs grid info, it has to find coord systems using
CoordSysBuilder, which sometimes modifies the file (constructs axes,
etc). Generally this is a good thing, but it means that in some cases
an FMRC isnt "just
another aggregation". Hopefully this is acceptable
Note that these issues are NOT blockers. Presumably we can fill in the
missing _FillValue attribute for the FMRC with hand-edited ncML.
thanks for the quick answer - Steve
====================
John Caron wrote:
all:
FMRC is different from other aggregations in that it operates on
GridDatasets rather than NetcdfFiles. The default enhance mode of
GridDataset is to add scale/offset and convert missing to NaNs.
Currently that setting can only be changed globally. Also, theres a bug
in the code allowing non-default enhancements in 4.1, that i may not be
able to fix until 4.2.
Non-FMRC aggregations dont do any enhancements unless you ask for it in
the NcML.
Whenever scale/offset is applied, the attributes scale_factor and
add_offset are removed, so there shouldnt be the danger of applying
twice. Since other (non-CDM) client libraries dont apply scale/offset,
it seems like the best default is to convert on the server. Performance
is typically (eg IDV) limited by latency, not bandwidth, so the 2X
increase in size hasnt been a problem. You may have use cases where its
more important.
Theres a lot of complexity around the enhancement code, and I will have
to look carefully at what I can support. Id like to hear your "gotta
have" needs, plus your "would be nice if it doesnt make the code
unstable" wishes.
John
On 4/27/2010 10:33 AM, Rich Signell wrote:
Steve,
Will wait to hear from John. My
inclination would be that the aggregation
process ought not to cause _FillValue, scale_factor, add_offset and
data
type to be presented differently than they would have been in the
original
unaggregated files.
And I would agree. I've been burned when people applied the wrong
scale_factor and add_offsets, and all I knew was that the values
looked funny (I didn't know they even had scale_factor and
add_offsets, but it eventually came out on investigation). And
doesn't it take twice as long to deliver data over opendap if the data
is float instead of short?
-Rich
- Steve
If you look at these original files (for example)
http://rocky.umeoce.maine.edu:8080/thredds/dodsC/gompom/operational/201004/gomoos.20100427.cdf.html
you can see that the variable "temp" is a "short integer" with
"scale_factor","add_offset" and "missing_value", while the FMRC "best
time series"
http://rocky.umeoce.maine.edu:8080/thredds/dodsC/gomoos/operational_model/UMaine_GoMOOS_cirulation_model_best.ncd.html
now the variable "temp" is a "float" with none of those attributes,
only NaN values.
I'm CC'ing John Caron, just to make sure I've got this right.
-Rich
On Tue, Apr 6, 2010 at 7:22 PM, Kevin
O'Brien<Kevin.M.O'address@hidden>
wrote:
Hi Rich -
I added some of your USGS best time series data to the UAF clean
catalog at:
http://ferret.pmel.noaa.gov/geoide/geoIDECleanCatalog.html
I think we'll find that this is also a case where "NaN" is used for the
missing value, but not specified in the variable attributes from the
best
time series. Maybe we should ask John Caron why the missing value of
NaN
doesn't get set by default as an attribute in those datasets. At any
rate,
I'll be interested to hear what you have to report as far as
performance
goes on those COAWST data...
Bob - I'm cc'ing you because you wanted to know when the catalog was
changed. I've also added some NOAA coastwatch aggregations to the
catalog...
Let me know if there are any questions..
Kevin
--
Kevin O'Brien UW/JISAO
Research Scientist NOAA/PMEL/TMAP
206-526-6751 http://www.pmel.noaa.gov
"The contents of this message are mine personally and do
not necessarily reflect any position of the Government
or the National Oceanic and Atmospheric Administration."
--
Steve Hankin, NOAA/PMEL -- address@hidden
7600 Sand Point Way NE, Seattle, WA 98115-0070
ph. (206) 526-6080, FAX (206) 526-6744
"The only thing necessary for the triumph of evil is for good men
to do nothing." -- Edmund Burke
|