Hi John -
I tried what you suggested and it didn't seem to have a
significant effect in making the initial access of the
aggregated dataset quicker. It still took over a minute and a
half to open the dataset. I've pasted the xml config that I
used to define the new aggregation below. To be honest, I'm
actually kind of glad because I wasn't looking forward to
modifying the guts of the application which generates the xml
config automatically.... :-)
I guess I can understand and probably even accept the fact that
for the first time the dataset is accessed, things will be a
little slow. After that, I presume the dataset is available in
the cache, and of course subsequent accesses prove that it is
because the response is quite quick. However, if the tomcat
server is restarted, it seems like whatever is in the cache is
ignored and the cache entries have to be rebuilt. I have my
aggregation cache set like so:
<AggregationCache>
<dir>/home/pmel/DataPortal/apache-tomcat-5.5.25/content/thredds/cacheAged/</dir>
<scour>24 hours</scour>
<maxAge>90 days</maxAge>
</AggregationCache> Does that seem correct? Also, as an
aside, you mention that you thought this would be quicker
because it avoids the OPeNDAP URL's....Shouldn't there be some
client side caching done w/ the OPeNDAP datasets? For example,
if I access a remote dataset with ncdump (or Ferret), and my
OPeNDAP caching is turned on my ~/.dodsrc file, it will cache
the response in the ~/.dods_cache directory. Does any of that
happen when OPeNDAP URL's are accessed through TDS???
Anyway - here's the xml config I used as per your suggestion:
<dataset ID="CM2.1U-D4_1PctTo2X_I1 atmos daily all vars
00010101-02201231_2" name="CM2.1U-D4_1PctTo2X_I1 atmos daily
all vars 00010101-02201231_2"
urlPath="ipcc_ar4_CM2.1_R1_1to2x-1_daily_atmos_00010101-02201231_2">
<serviceName>thisDODS3</serviceName>
<netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation type="union">
<netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="time" type="joinExisting">
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/pr_A2.00010101-01001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/pr_A2.01010101-02001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/pr_A2.02010101-02201231.nc"
ncoords="7300" />
</aggregation>
</netcdf>
<netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="time" type="joinExisting">
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmax_A2.00010101-01001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmax_A2.01010101-02001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmax_A2.02010101-02201231.nc"
ncoords="7300" />
</aggregation>
</netcdf>
<netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="time" type="joinExisting">
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmin_A2.00010101-01001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmin_A2.01010101-02001231.nc"
ncoords="36500" />
<netcdf
location="file:/data/gfdl_cm2_1/CM2.1U-D4_1PctTo2X_I1/pp/atmos/ts/daily/tasmin_A2.02010101-02201231.nc"
ncoords="7300" />
</aggregation>
</netcdf>
</aggregation>
</netcdf>
</dataset>
I'm open to any suggestions or ideas!
thanks -
kevin
John Caron wrote:
Hi Kevin:
I havent had time to reproduce this yet, but im guessing one
source of the slowdown is using opendap URLS in the compound
aggregation. It would be interesting to time 1) the single
aggregations, 2) the compound agg as it exists, and 3) the
compound agg, but replace the opendap URLs with direct netcdf
files,
see attached file