Hi John -
I tried what you suggested and it didn't seem to have a significant
effect in making the initial access of the aggregated dataset
quicker. It still took over a minute and a half to open the
dataset. I've pasted the xml config that I used to define the new
aggregation below. To be honest, I'm actually kind of glad because I
wasn't looking forward to modifying the guts of the application which
generates the xml config automatically.... :-)
I guess I can understand and probably even accept the fact that for
the first time the dataset is accessed, things will be a little
slow. After that, I presume the dataset is available in the cache,
and of course subsequent accesses prove that it is because the
response is quite quick. However, if the tomcat server is
restarted, it seems like whatever is in the cache is ignored and the
cache entries have to be rebuilt. I have my aggregation cache set
like so:
<AggregationCache>
<dir>/home/pmel/DataPortal/apache-tomcat-5.5.25/content/thredds/cacheAged/</dir>
<scour>24 hours</scour>
<maxAge>90 days</maxAge>
</AggregationCache> Does that seem correct? Also, as an aside, you
mention that you thought this would be quicker because it avoids the
OPeNDAP URL's....Shouldn't there be some client side caching done w/
the OPeNDAP datasets? For example, if I access a remote dataset with
ncdump (or Ferret), and my OPeNDAP caching is turned on my ~/.dodsrc
file, it will cache the response in the ~/.dods_cache directory.
Does any of that happen when OPeNDAP URL's are accessed through TDS???
Anyway - here's the xml config I used as per your suggestion:
<dataset ID="CM2.1U-D4_1PctTo2X_I1 atmos daily all vars
00010101-02201231_2" name="CM2.1U-D4_1PctTo2X_I1 atmos daily all vars
00010101-02201231_2"
urlPath="ipcc_ar4_CM2.1_R1_1to2x-1_daily_atmos_00010101-02201231_2">
<serviceName>thisDODS3</serviceName>
<netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation type="union">
<netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="time" type="joinExisting">
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/pr_A2.00010101-01001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/pr_A2.01010101-02001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/pr_A2.02010101-02201231.nc"
ncoords="7300" />
</aggregation>
</netcdf>
<netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="time" type="joinExisting">
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmax_A2.00010101-01001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmax_A2.01010101-02001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmax_A2.02010101-02201231.nc"
ncoords="7300" />
</aggregation>
</netcdf>
<netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="time" type="joinExisting">
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmin_A2.00010101-01001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmin_A2.01010101-02001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmin_A2.02010101-02201231.nc"
ncoords="7300" />
</aggregation>
</netcdf>
</aggregation>
</netcdf>
</dataset>
I'm open to any suggestions or ideas!
thanks -
kevin
John Caron wrote:
Hi Kevin:
I havent had time to reproduce this yet, but im guessing one source
of the slowdown is using opendap URLS in the compound aggregation.
It would be interesting to time 1) the single aggregations, 2) the
compound agg as it exists, and 3) the compound agg, but replace the
opendap URLs with direct netcdf files,
see attached file