[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 20020822: ldmd won't stay running
- Subject: Re: 20020822: ldmd won't stay running
- Date: Fri, 23 Aug 2002 10:07:01 -0600
John C Nordlie wrote:
>
> Ok, I've hacked the ldmd.conf and ldmadmin files to include
> the logfile override. I also rebooted the machine and zero'd
> the log file. Here is the output of one attempt to start
> the ingestor with 'ldmadmin start':
>
Hi John,
Thanks! This is helpful. I see a few things going on:
> Aug 23 15:52:15 rpc.ldmd[179]: Starting Up (built: Jun 12 2002 15:26:16)
> Aug 23 15:52:15 amelia[183]: run_requester: Starting Up:
> amelia.geol.iastate.edu
> Aug 23 15:52:15 amelia[183]: run_requester: 20020823145215.937 TS_ENDT
> {{HDS, ".*"},{MCIDAS, ".*"},{IDS|DDPLUS, ".*"}}
> Aug 23 15:52:15 remus[185]: run_requester: Starting Up: remus.rwic.und.edu
> Aug 23 15:52:15 remus[185]: run_requester: 20020823145215.940 TS_ENDT
> {{NLDN, ".*"}}
> Aug 23 15:52:15 129.15.194.231[187]: run_requester: Starting Up:
> 129.15.194.231
> Aug 23 15:52:15 129.15.194.232[188]: run_requester: Starting Up:
> 129.15.194.232
> Aug 23 15:52:15 129.15.194.232[188]: run_requester: 20020823145215.944
> TS_ENDT {{ANY, ".*"}}
> Aug 23 15:52:15 129.15.194.233[189]: run_requester: Starting Up:
> 129.15.194.233
> Aug 23 15:52:15 129.15.194.233[189]: run_requester: 20020823145215.946
> TS_ENDT {{ANY, ".*"}}
> Aug 23 15:52:15 129.15.194.234[190]: run_requester: Starting Up:
> 129.15.194.234
> Aug 23 15:52:15 129.15.194.234[190]: run_requester: 20020823145215.947
> TS_ENDT {{ANY, ".*"}}
> Aug 23 15:52:15 129.15.194.236[192]: run_requester: Starting Up:
> 129.15.194.236
> Aug 23 15:52:15 129.15.194.236[192]: run_requester: 20020823145215.950
> TS_ENDT {{ANY, ".*"}}
> Aug 23 15:52:15 129.15.194.237[193]: run_requester: Starting Up:
> 129.15.194.237
> Aug 23 15:52:15 129.15.194.237[193]: run_requester: 20020823145215.951
> TS_ENDT {{ANY, ".*"}}
> Aug 23 15:52:15 129.15.194.238[194]: run_requester: Starting Up:
> 129.15.194.238
> Aug 23 15:52:15 129.15.194.238[194]: run_requester: 20020823145215.953
> TS_ENDT {{ANY, ".*"}}
> Aug 23 15:52:15 aeolus[184]: run_requester: Starting Up: aeolus.ucsd.edu
> Aug 23 15:52:15 aeolus[184]: run_requester: 20020823145215.960 TS_ENDT
> {{NNEXRAD, "/p......"},{FNEXRAD,
> "/p...(BIS|MBX|MVX|ABR|FSD|UDX|DLH|MPX)"}}
> Aug 23 15:52:15 dns2[186]: run_requester: Starting Up: dns2.cmc.ec.gc.ca
> Aug 23 15:52:15 dns2[186]: run_requester: 20020823145215.962 TS_ENDT
> {{GEM, ".*"}}
> Aug 23 15:52:15 129.15.194.231[187]: run_requester: 20020823145215.943
> TS_ENDT {{ANY, ".*"}}
> Aug 23 15:52:15 129.15.194.235[191]: run_requester: Starting Up:
> 129.15.194.235
> Aug 23 15:52:15 129.15.194.235[191]: run_requester: 20020823145215.966
> TS_ENDT {{ANY, ".*"}}
> Aug 23 15:52:15 remus[185]: FEEDME(remus.rwic.und.edu): OK
> Aug 23 15:52:15 pqact[181]: Starting Up
> Aug 23 15:52:15 pqbinstats[180]: Starting Up (179)
> Aug 23 15:52:16 pqsurf[182]: Starting Up (179)
> Aug 23 15:52:16 pqsurf[182]: pq_open failed:
> /usr/local/ldm/data/pqsurf.pq: No such file or directory
Looks like you're trying to run pqsurf without a pqsurf queue. The LDM
is coded to exit when one of its children exits. Below we can see that
the parent rpc.ldmd, PID #179, has decided to exit, thus terminating the
whole process group.
> Aug 23 15:52:16 pqsurf[182]: Exiting
> Aug 23 15:52:16 rpc.ldmd[179]: Exiting
> Aug 23 15:52:16 remus[185]: Exiting
> Aug 23 15:52:16 pqbinstats[180]: Exiting
> Aug 23 15:52:16 pqsurf[182]: waitpid: No child processes
> Aug 23 15:52:16 pqsurf[182]: Number of products 0
> Aug 23 15:52:16 pqsurf[182]: Number of observations 0
> Aug 23 15:52:16 pqsurf[182]: Number of dups 0
> Aug 23 15:52:16 rpc.ldmd[179]: Terminating process group
> Aug 23 15:52:16 rpc.ldmd[179]: child 182 exited with status 1
> Aug 23 15:52:16 129.15.194.233[189]: FEEDME(129.15.194.233): reclass:
> 20020823145215.946 TS_ENDT {{NEXRD2, ".*"}}
> Aug 23 15:52:16 129.15.194.234[190]: FEEDME(129.15.194.234): reclass:
> 20020823145215.947 TS_ENDT {{NEXRD2, ".*"}}
> Aug 23 15:52:16 129.15.194.236[192]: FEEDME(129.15.194.236): reclass:
> 20020823145215.950 TS_ENDT {{NEXRD2, ".*"}}
> Aug 23 15:52:16 pqact[181]: Exiting
> Aug 23 15:52:16 129.15.194.235[191]: FEEDME(129.15.194.235): reclass:
> 20020823145215.966 TS_ENDT {{NEXRD2, ".*"}}
> Aug 23 15:52:16 129.15.194.238[194]: FEEDME(129.15.194.238): reclass:
> 20020823145215.953 TS_ENDT {{NEXRD2, ".*"}}
> Aug 23 15:52:16 129.15.194.233[189]: FEEDME(129.15.194.233): OK
> Aug 23 15:52:16 129.15.194.233[189]: Exiting
> Aug 23 15:52:16 129.15.194.234[190]: FEEDME(129.15.194.234): OK
> Aug 23 15:52:16 129.15.194.234[190]: Exiting
> Aug 23 15:52:16 129.15.194.236[192]: FEEDME(129.15.194.236): OK
> Aug 23 15:52:16 129.15.194.236[192]: Exiting
> Aug 23 15:52:16 129.15.194.238[194]: FEEDME(129.15.194.238): OK
> Aug 23 15:52:16 129.15.194.238[194]: Exiting
> Aug 23 15:52:16 129.15.194.235[191]: FEEDME(129.15.194.235): OK
> Aug 23 15:52:16 129.15.194.235[191]: Exiting
> [195] 020823/1052 [DC 3] Starting up.
> [195] 020823/1052 [DC 5] Normal termination.
> [195] 020823/1052 [DC 2] Number of bulletins read and processed: 0
> [195] 020823/1052 [DC 6] Shutting down.
> Aug 23 15:52:18 129.15.194.231[187]: FEEDME(129.15.194.231): reclass:
> 20020823145215.943 TS_ENDT {{NEXRD2, ".*"}}
> Aug 23 15:52:18 amelia[183]: FEEDME(amelia.geol.iastate.edu): OK
> Aug 23 15:52:18 amelia[183]: Exiting
> Aug 23 15:52:27 129.15.194.231[187]: FEEDME(129.15.194.231): OK
> Aug 23 15:52:27 129.15.194.231[187]: Exiting
> Aug 23 15:52:36 129.15.194.237[193]: FEEDME(129.15.194.237): reclass:
> 20020823145215.951 TS_ENDT {{NEXRD2, ".*"}}
> Aug 23 15:52:41 dns2[186]: FEEDME(dns2.cmc.ec.gc.ca): can't contact
> portmapper: RPC: Timed out
> Aug 23 15:52:41 aeolus[184]: FEEDME(aeolus.ucsd.edu): can't contact
> portmapper: RPC: Timed out
I wonder about these. I don't know if this is a separate problem or if
it's related to the termination of the process group, although I suspect
the latter. I would try an ldmping to these sites to ensure that the
RPC call won't time out.
> Aug 23 15:53:11 dns2[186]: Exiting
> Aug 23 15:53:11 aeolus[184]: Exiting
> Aug 23 15:53:15 129.15.194.232[188]: FEEDME(129.15.194.232): select: RPC:
> Timed out
> Aug 23 15:53:19 129.15.194.237[193]: h_clnt_call: 129.15.194.237: FEEDME:
> time elapsed 43.245131
> Aug 23 15:53:19 129.15.194.237[193]: FEEDME(129.15.194.237): OK
> Aug 23 15:53:19 129.15.194.237[193]: Exiting
> Aug 23 15:53:45 129.15.194.232[188]: Exiting
>
So, please fix the pqsurf problem and let me know what happens.
This sounds different than when you said the LDM would run for a few
hours and then quit. Perhaps something else is also going on...
Anne
--
***************************************************
Anne Wilson UCAR Unidata Program
address@hidden P.O. Box 3000
Boulder, CO 80307
----------------------------------------------------
Unidata WWW server http://www.unidata.ucar.edu/
****************************************************