[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LDM mystery
- Subject: Re: LDM mystery
- Date: Wed, 31 Aug 2005 11:18:41 -0600
Bob,
The top-level LDM server appears to be hung, currently. In order to
discover where, I've asked Sarah to have root use the strace(1) utility.
If that fails, then I think we'll need to install a version of the LDM
that has debugging enabled and execute it in an environment that
supports core dumps from programs that are owned by root and setuid.
Regards,
Steve Emmerson
------Original Message
>Date: Wed, 31 Aug 2005 17:01:11 +0000
>From: Bob Lipschutz <address@hidden>
>Organization: NOAA/NWS/FSL
>To: Steve Emmerson <address@hidden>
>Subject: LDM mystery
Thanks for working with us on our ldm problem. Unfortunately,
we seem to be running out of things to try, so we're very
open to any suggestions.
To date, we've tried installing various versions of ldm with
various kernels, with no change in the result. We see this
same behavior with several different hosts, although others
have been working for us without any problems.
Here's what we are seeing as of 8/31 16:36Z.... output from
netstat, ps, ldmping, rpcinfo, and ldm server and client logs.
Note that the one zombie process did not get reaped from yesterday...
this suggests to me that the parent process is not getting back to
service its children. Also, when we see any connections in SYN_RECV
state, the server no longer accepts new connections (although
existing connections remain up and funtioning). The condition
appears in both the server and client logs, with a nullproc_6
timeout on the server and "No IS_ALIVE reply" timeout on the
client.
Thanks again!
Bob Lipschutz
303-497-6636
eldmf1:/usr/local/ldm/etc> netstat
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52277
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52281
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52268
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52258
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52267
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52257
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52300
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52270
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 yampa.fsl.noaa.gov:43990
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52298
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52379
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 yampa.fsl.noaa.gov:43991
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52259
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52389
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52279
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52382
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52278
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52269
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52381
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52405
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 yampa.fsl.noaa.gov:43992
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52280
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52390
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52397
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52260
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 kelpie.fsl.noaa.gov:52380
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 yampa.fsl.noaa.gov:43989
SYN_RECV
tcp 0 0 eldmf1.fsl.noaa.gov:388 yampa.fsl.noaa.gov:40063
ESTABLISHED
tcp 0 28 eldmf1.fsl.noaa.gov:38204 prism.fsl.noaa.gov:388
ESTABLISHED
tcp 0 0 eldmf1.fsl.noaa.gov:388 yampa.fsl.noaa.gov:40066
ESTABLISHED
tcp 0 0 eldmf1.fsl.noaa.gov:388 yampa.fsl.noaa.gov:40065
ESTABLISHED
tcp 0 0 eldmf1.fsl.noaa.gov:388 yampa.fsl.noaa.gov:40064
ESTABLISHED
<snip>
eldmf1:/usr/local/ldm/etc> ps -efl | grep rpc.ldm
5 S rtoper 23317 1 0 75 0 - 763 322475 Aug30 ? 00:00:00
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23319 23317 0 75 0 - 4749 - Aug30 ? 00:02:03
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23325 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:05
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23326 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:11
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23327 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:05
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23328 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:04
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23331 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:03
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23332 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:12
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23333 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:03
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23334 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:02
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23335 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:03
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23336 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:02
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23337 23317 0 75 0 - 927 rt_sig Aug30 ? 00:00:02
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23338 23317 0 75 0 - 927 rt_sig Aug30 ? 00:00:02
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23339 23317 0 75 0 - 927 rt_sig Aug30 ? 00:00:02
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23340 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:03
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23341 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:02
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23342 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:02
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23343 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:02
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23344 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:02
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23345 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:02
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23346 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:03
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 S rtoper 23347 23317 0 76 0 - 927 rt_sig Aug30 ? 00:00:03
rpc.ldmd -v -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
1 Z rtoper 24609 23317 0 75 0 - 0 exit Aug30 ? 00:00:00
[rpc.ldmd] <defunct>
eldmf1:/usr/local/ldm/logs> ldmping -i 3 localhost
Aug 31 16:44:21 State Elapsed Port Remote_Host rpc_stat
Aug 31 16:44:31 H_CLNTED 9.998874 388 localhost select: RPC: Timed out
eldmf1:/usr/local/ldm/logs> /usr/sbin/rpcinfo -n 388 -t eldmf1.fsl.noaa.gov
300029 6
rpcinfo: RPC: Timed out
program 300029 version 6 is not available
>From ldmd.log on server (eldmf1):
Aug 30 17:37:09 eldmf1 yampa(feed)[23326]: up6.c:287: nullproc_6() failure to
yampa.fsl.noaa.gov: RPC: Timed out
Aug 30 17:37:09 eldmf1 yampa(feed)[23326]: Delay: 59.9919 sec
Aug 30 17:37:09 eldmf1 yampa(feed)[23326]: pq_sequence():
time(insert)-time(create): 0.0149 s
Aug 30 17:37:09 eldmf1 yampa(feed)[23326]: Delay: 59.4061 sec
Aug 30 17:37:09 eldmf1 yampa(feed)[23327]: up6.c:287: nullproc_6() failure to
yampa.fsl.noaa.gov: RPC: Timed out
Aug 30 17:37:09 eldmf1 yampa(feed)[23327]: Delay: 59.9922 sec
Aug 30 17:37:09 eldmf1 yampa(feed)[23327]: pq_sequence():
time(insert)-time(create): 0.0149 s
Aug 30 17:37:09 eldmf1 yampa(feed)[23327]: Delay: 59.4064 sec
Aug 30 17:37:09 eldmf1 yampa(feed)[23327]: pq_sequence():
time(insert)-time(create): 0.0102 s
Aug 30 17:37:09 eldmf1 yampa(feed)[23327]: Delay: 56.3724 sec
Aug 30 17:37:09 eldmf1 yampa(feed)[23327]: pq_sequence():
time(insert)-time(create): 0.0154 s
Aug 30 17:37:09 eldmf1 yampa(feed)[23327]: Delay: 55.3660 sec
Aug 30 17:37:09 eldmf1 yampa(feed)[23327]: pq_sequence():
time(insert)-time(create): 0.0111 s
Aug 30 17:37:09 eldmf1 kelpie(feed)[23340]: up6.c:287: nullproc_6() failure to
kelpie.fsl.noaa.gov: RPC: Timed out
Aug 30 17:37:09 eldmf1 kelpie(feed)[23343]: up6.c:287: nullproc_6() failure to
kelpie.fsl.noaa.gov: RPC: Timed out
>From ldmd.log on client (kelpie):
Aug 30 17:36:35 kelpie eldmf1[29755]: INFO: requester6.c:105: No IS_ALIVE reply
from upstream LDM on eldmf1.fsl.noaa.gov; assuming sending LDM is alive;
requester6.c:105: RPC: Timed out
Aug 30 17:36:35 kelpie eldmf1[29755]: Upstream LDM is alive. Waiting...
Aug 30 17:36:37 kelpie eldmf1[29761]: INFO: requester6.c:105: No IS_ALIVE reply
from upstream LDM on eldmf1.fsl.noaa.gov; assuming sending LDM is alive;
requester6.c:105: RPC: Timed out
Aug 30 17:36:37 kelpie eldmf1[29761]: Upstream LDM is alive. Waiting...
Aug 30 17:36:37 kelpie eldmf1[29760]: INFO: requester6.c:105: No IS_ALIVE reply
from upstream LDM on eldmf1.fsl.noaa.gov; assuming sending LDM is alive;
requester6.c:105: RPC: Timed out
Aug 30 17:36:37 kelpie eldmf1[29760]: Upstream LDM is alive. Waiting...
Aug 30 17:36:37 kelpie eldmf1[29769]: INFO: requester6.c:105: No IS_ALIVE reply
from upstream LDM on eldmf1.fsl.noaa.gov; assuming sending LDM is alive;
requester6.c:105: RPC: Timed out
------End of Original Message