[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Thredds - Grib2 Collection Indexing as Independent Task
- Subject: Re: Thredds - Grib2 Collection Indexing as Independent Task
- Date: Mon, 12 Aug 2013 15:20:33 -0600
Hi Tim:
1. The CFSR dataset has some know encoding defects, not sure if your files have
those problems.
2. change
<collection
spec="/thredds02/cf_reanalysis/**/ocnh[0-9]{2}\.gdas\.[0-9]{10}\.grb2"
recheckAfter="5 min" olderThan="5 min"/>
to
<collection
spec="/thredds02/cf_reanalysis/**/ocnh[0-9]{2}\.gdas\.[0-9]{10}\.grb2" />
because you done want to rescan this collection every 5 minutes!
3. stop the tds, delete or archive off content/thredds/logs, restart the tds,
run for an hour, then zip up the log files and send them to me. optionally stop
the tds until we can check if there is a problem you have to redo anyway.
4. we do have a background indexer, but its still beta. you can look at the
docs we have so far, but i wouldnt try to run yet:
http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.3/reference/collections/TDM.html
John
On 8/12/2013 3:07 PM, Timothy Lewis - NOAA Affiliate wrote:
> John,
>
> I'm sorry for the blank message. I accidentally discovered a new
> keyboard shortcut for sending a draft email.
>
> The dataset in question has about 300,000 files(28 files per day for 30
> years). I've attached the catalog file for the aggregation, as well as
> the threddsConfig.xml. These are the only relevant configuration files
> I know of. If you would like others, please let me know. Thank you for
> your help, and sorry again for the mistakenly sent email.
>
> Thanks,
>
> Tim
>
> On Mon, Aug 12, 2013 at 3:45 PM, John Caron <address@hidden
> <mailto:address@hidden>> wrote:
>
> Hi Tim:
>
> 1) can you send me your configuration files so i can be sure what
> you are doing.
>
> 2) how many files are there in the aggregation?
>
> John
>
>
> On 8/12/2013 1:49 PM, Timothy Lewis - NOAA Affiliate wrote:
>
> John,
>
> My name is Tim Lewis, and I manage the OceanNOMADS Thredds server at
> NCDDC. We are attempting to aggregate 30 years worth of Climate
> Forecast System Reanalysis. We've added the aggregation to our
> Thredds
> server, but indexing the grib2 files seems to slow the server
> down by
> hogging all resources. Performance gets progressively worse
> until the
> server becomes unusable and must be restarted.
>
> Our current aggregation has been indexing for about 7 days,
> interruped
> twice for restarts due to performance. We have tested an
> aggregation of
> 10% of this dataset before, and it took about 3 days to build the
> aggregation. Assuming a linear scaling, we're looking at a month of
> indexing and therefore a month of poor performance. The
> aggregation can
> be reached at the following URL:
>
>
> http://ecowatch.ncddc.noaa.__gov/thredds/oceanNomads/aggs/__catalog_cfsr_aggs.html
>
> <http://ecowatch.ncddc.noaa.gov/thredds/oceanNomads/aggs/catalog_cfsr_aggs.html>
>
> Is there any way to separate the indexing the feature collection
> from
> the serving of data requests? Ideally, we would be able to
> background
> an interruptable indexing task and continue to serve data
> through the
> web interface. This morning, we attempted pointing a separate
> Thredds
> installation at a pre-indexed aggregation, thinking that we
> could index
> on one machine and then serve from another. This was unsuccessful,
> though I'm not sure why, being as the ncx files were already
> present.
>
> Do you have any suggestions on how we might have this aggregation
> indexed while still serving regular requests without the performance
> hit? We appreciate any advice you can give. Thank you for your
> help.
>
> Sincerely,
>
> Tim Lewis
>
>
>
> --
> Tim Lewis, Associate Software Engineer
> General Dynamics Information Technology
> NOAA Coastal Data Development Center
> 1021 Balch Boulevard, Suite 1003
> Stennis Space Center, Mississippi 39529 USA
>
> _228.688.2126 <tel:228.688.2126> <tel:228.688.2126
> <tel:228.688.2126>>_
> address@hidden <mailto:address@hidden>
> <mailto:address@hidden <mailto:address@hidden>__>_
> address@hidden <mailto:address@hidden>
> <mailto:address@hidden <mailto:address@hidden>__>_
>
>
>
>
> --
> Tim Lewis, Associate Software Engineer
> General Dynamics Information Technology
> NOAA Coastal Data Development Center
> 1021 Balch Boulevard, Suite 1003
> Stennis Space Center, Mississippi 39529 USA
>
> _228.688.2126_
> address@hidden <mailto:address@hidden>_
> address@hidden <mailto:address@hidden>_