[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20050907: Newbie problems with catalog generator



Hi Tennessee,

Is the OPeNDAP server already setup to serve this data? What do you get with the URL http://localhost:8010/thredds/dodsC/catalog.xml? The current THREDDS server, when setup to serve data via OPeNDAP, automatically generates a catalog of the data being served (so you may not need to use CatGen). I think the version from two months ago should also do that. Could you look at the manifest.mf file in the thredds.war file you are using and let me know what the "Built-By", Built-On", Implementation-Title", and "Implementation-Version" values are?

As for the CatGen stuff, try changing the resultService@accessPointHeader value to "/data/pymars" (matching the value in accessPoint) and the datasetFilter@matchPattern value to ".*\.nc$" and for now remove the datasetNamer element.Oh yeah, you'll probably need to change the resultService@base value to "http://localhost:8010/thredds/dodsC/";.

Hope that helps,

Ethan

Tennessee Leeuwenburg wrote:

Hi Ethan,

I'm not sure I fully grokked what you said to me, so I've just included my 
catalog generator file without further modification.

I have data living on disk in /data/pymars/2004/netcdf_anal, and 
/data/pymars/2004/netcdf_fore. I would like to set up the catalog generator to 
crawl the /data/pymars directory and publish what it find there -- no 
requirement for very intelligent structuring at this stage.

The dods server is running on localhost:8010.

I'm not entirely certain what version is running, but it is whatever is current 
on the web page as of about 2 months ago. I look forward to the new version, 
and the simpler configuration!

I wasn't sure what I had to do with all that pattern matching stuff, so I 
decided to just leave it unchanged from the example, and just see what 
happened. I imagine I have to replace the datasetFilter to accept *.nc, or some 
other pattern of my choosing. I couldn't work out if the dataset namer was 
mandatory or not. I'd really just like to capture everything, and am happy with 
the title being the filename at this stage.

Cheers,
-Tennessee

<?xml version="1.0" encoding="UTF-8"?>
<!-- $Id: catGenConf.exampleLocal.xml,v 1.2 2004/06/03 20:38:07 edavis Exp $ -->
<!-- - Simple example CatalogGenConfig file.
 -->
<!DOCTYPE catalog SYSTEM 
"http://www.unidata.ucar.edu/projects/THREDDS/xml/CatalogGenConfig.0.5.dtd";>
<catalog name="THREDDS CatalogGen test config file" version="0.6">
 <dataset name="THREDDS CatalogGen test config file">
   <dataset name="NCEP Eta 80km CONUS model data">
     <metadata metadataType="CatalogGenConfig">
       <catalogGenConfig type="Catalog">
         <datasetSource name="Local Disk Data Sets" type="Local"
           structure="DirTree"
           accessPoint="/data/pymars">
           <resultService name="linuxdev" serviceType="DODS"
             base="http://localhost:8010/thredds/cataloggen/";
             
accessPointHeader="/home/tjl/jakarta-5.0.28/content/thredds/cataloggen/"/>
           <datasetFilter name="Accept netCDF files only" type="RegExp"
             matchPattern="/[0-9][^/]*_eta_211\.nc$"/>
           <datasetNamer name="NCEP Eta 80km CONUS model data"
             type="RegExp" addLevel="false"
             
matchPattern="([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])([0-9][0-9])_eta_211.nc$"
             substitutePattern="NCEP Eta 80km CONUS $1-$2-$3 $4:00:00 GMT"/>
         </datasetSource>
       </catalogGenConfig>
     </metadata>
   </dataset>
   <dataset name="NCEP GFS 80km CONUS model data">
     <metadata metadataType="CatalogGenConfig">
       <catalogGenConfig type="Catalog">
         <datasetSource name="model data source" type="Local"
           structure="Flat"
           accessPoint="./content/thredds/cataloggen/testData/model">
           <resultService name="mlode" serviceType="DODS"
             base="http://localhost:8080/thredds/cataloggen/";
             accessPointHeader="./content/thredds/cataloggen/"/>
           <datasetFilter name="Accept netCDF files only" type="RegExp"
             matchPattern="/[0-9][^/]*_gfs_211\.nc$"/>
           <datasetNamer name="NCEP GFS 80km CONUS model data"
             type="RegExp" addLevel="false"
             
matchPattern="([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])([0-9][0-9])_gfs_211.nc$"
             substitutePattern="NCEP GFS 80km CONUS $1-$2-$3 $4:00:00 GMT"/>
         </datasetSource>
       </catalogGenConfig>
     </metadata>
   </dataset>
 </dataset>
</catalog>



Ethan Davis wrote:

Tennessee Leeuwenburg wrote:

Ethan Davis wrote:

Hi Tennessee,

Did you edit the config.xml file (which sets up the tasks) as well as
the cat gen config file? I guess you must have if it is showing up in
the interface. Make sure the period value is not set to zero; if it
is, the task won't be run. Are you getting any messages in the log
files? What version of the server are you running? Is this a publicly
available server? If so, send me the URL and I'll take a look at the
config files.

Sorry these config file formats are so ugly. We're working on
simplifying and cleaning up the configuration throughout the server.
But for now ...


Well, as long as you're willing to help me, ugly is fine :)
More than willing to help. But I want simpler because it would make it easier for me to remember what is going on :)

After making that change, the server started to process the various
files. The exampls DODS catalog was generated fine, the example
filesystem catalog and my own filesystem catalog both failed with
similar messages. I've appended the results.

I think I'm failing to understand what exactly the serviceName, base and
accessPointHeader are actually used for.

As with regular catalogs, I assume one is used for reconstructing the
URL to the file to be resourced, and the other is used for constructing
the URL to be used in an OpenDAP request, but it's not clear to me
exactly what is happened. I read the documentation, but it was a bit
hand-wavy about the specifics.
The accessPoint is the directory that is to be scanned for data files. The accessPointHeader is a parent directory of the accessPoint directory and is used to remove the part of the data file path that is not to appear in the resulting dataset access URL. The base value is the URL for the OPeNDAP server that is serving your data. For instance, if you want to crawl the /my/data/radar/level3/FTG directory and a resulting dataset access URL is something like http://.../nph-dods/radar/level3/FTG/file.nc, you would want something like

<datasetSource name="model data source" type="Local" structure="Flat"
                        accessPoint="/my/data/radar/level3/FTG">
<resultService name="mlode" serviceType="DODS" base="http://.../nph-dods/";
                             accessPointHeader="/my/data/"/>
       <datasetFilter ... />
       <datasetNamer ... />
</datasetSource>

Does that clear things up at all? If not, feel free to send me your config file to look at.

Sorry about the documentation. It isn't all that clear and I haven't put much effort into it since we decided to move to a simpler config file format. Not sure what's up below with the example file system dataset. I must have broken something at some point.

What version of the cat gen servlet (or THREDDS server) are you running?

Ethan

PS In the new TDS, catalogs for the data it is serving are automatically generated and the config files are much simpler than these.
Thanks for your help,
-T

<catalog name="THREDDS CatalogGen test config file" version="0.6">
−
<dataset name="THREDDS CatalogGen test config file">
<service name="linuxdev" serviceType="DODS"
base="http://localhost:8010/thredds/cataloggen/"/>
<service name="mlode" serviceType="DODS"
base="http://localhost:8080/thredds/cataloggen/"/>
−
<dataset name="NCEP Eta 80km CONUS model data">
<dataset name="The DatasetSource "Local Disk Data Sets" could not be
expanded. The accessPointHeader
(/home/tjl/jakarta-5.0.28/content/thredds/cataloggen/) is not a
directory." serviceName="linuxdev"/>
</dataset>
−
<dataset name="NCEP GFS 80km CONUS model data">
<dataset name="The DatasetSource "model data source" could not be
expanded. The accessPointHeader (./content/thredds/cataloggen/) is not a
directory." serviceName="mlode"/>
</dataset>
</dataset>
</catalog>



--
Ethan R. Davis                                Telephone: (303) 497-8155
Software Engineer                             Fax:       (303) 497-8690
UCAR Unidata Program Center                   E-mail:    address@hidden
P.O. Box 3000
Boulder, CO  80307-3000                       http://www.unidata.ucar.edu/
---------------------------------------------------------------------------