[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LDM - ldm process becomes defunct on server
- Subject: Re: LDM - ldm process becomes defunct on server
- Date: Mon, 29 Aug 2005 13:46:25 -0600
Sarah,
>Date: Fri, 26 Aug 2005 18:07:12 -0600 (MDT)
>From: "sarah thompson" <address@hidden>
>Organization: NOAA/NWS/FSL
>To: address@hidden
>Subject: LDM - ldm process becomes defunct on server
The above message contained the following:
> Institution: noaa/fsl
> Package Version: 6.4.1
> Operating System: fedora core 4
> Hardware Information: dell poweredge 750
> Inquiry: Have an upstream "server" that is feeding data to 2
> downstream machines. I'm running fedora core 4..which i think is what
> you all are running. i'm on kernel 2.6.12-1
> I compiled with gdb but when the upstream machine "dies" meaning
> all process' reporting as defunct, it didn't produce a core file.
On a Linux system, the following must be true for a normally-installed
LDM to dump a corefile:
1. "ulimit -c" must return a non-zero value (preferably "unlimited")
2. The file "/proc/sys/kernel/suid_dumpable" must contain a "2".
> the downstream machines have this message in their ldmd.log
> NOTICE: requester6.c:447; ldm_clnt.c:310: nullproc_6 failure to
> eldmf1.fsl.no aa.gov; ldm_clnt.c:145: RPC: Timed out
The above means that a downstream LDM-6 process sent a NULLPROC message
to an upstream LDM-6 but the reply from the upstream LDM-6 timed-out.
An obvious candidate is that the upstream LDM lacks an ALLOW entry for
the downstream host in the upstream LDM's configuration-file.
What does the following command return when executed on the downstream
host?
/usr/sbin/rpcinfo -n 388 -t eldmf1.fsl.no aa.gov 300029 6
This command bypasses the LDM on the downstream host, completely and
should indicate that the LDM on the upstream host is available, e.g.,
$ /usr/sbin/rpcinfo -n 388 -t oliver.unidata.ucar.edu 300029 6
program 300029 version 6 ready and waiting
If it fails, then is there anything corresponding to the connection
attempt in the logfile of the upstream LDM?
> I have been testing for weeks now on many os' and different kernels
> and eventually ldm always crashes with the above error.
What version of the LDM?
> I have attached the ldmd.log.
I didn't see any evidence of the LDM crashing in the logfile you sent.
> I have no idea what else to test. Hope you have
> insights, as I'm all out of troubleshooting ideas. Thanks. Sarah
...
Regards,
Steve Emmerson