[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #LXP-916564]: ldmping
- Subject: [LDM #LXP-916564]: ldmping
- Date: Fri, 17 Apr 2009 10:01:44 -0600
Michael,
> We’re still having problems with connections timing out on LDM. I can bring
> you up to date on the steps we've taken.
>
> The problem is we're seeing some data coming into LDM but we're seeing many
> disconnects and connections denied in the log.
>
> Have tried both the latest linux Red Hat kernel and a previous kernel under
> which LDM ran for a couple months without problems. The problem persists on
> both kernels.
>
> Problem appears only on two servers, LDM-11 and LDM-12. We have other LDM
> installations that work fine.
>
> On the servers with the problem, LDMPING LOCALHOST rarely resolves the IP and
> connects. Most often we get this:
>
> [ldm@ldm-12 ldm]$ ldmping localhost
> Apr 17 14:23:13 INFO: State Elapsed Port Remote_Host
> rpc_stat
> Apr 17 14:23:13 INFO: Resolving localhost to 127.0.0.1 took 0.000302 seconds
> Apr 17 14:23:23 ERROR: H_CLNTED 10.000029 388 localhost select: RPC:
> Timed out
>
> If we change the port from 388 to 389 or 532 or 3885, LDMPING LOCALHOST works
> fine every time, never a failure. If we change the port back to 388 the
> problem returns.
>
> We’ve kick-started the server, rebuilt LDM according to the directions on the
> "LDM INSTALL" web page. We used "make install_setuids" to set the proper
> owner and permissions on all LDM files and directories.
>
> We created a very basic ldmd.conf file with nothing in it other than this:
>
> #
> #
> #
> # CRH ldm.crh.noaa.gov
> #
> #
> #exec "pqexpire"
> exec "pqbinstats"
If you're not using the output from "pqbinstats" then you should remove this
entry. It shouldn't affect your problem, however.
> exec "rtstats -h rtstats.unidata.ucar.edu"
> #
> ##############################################################################
> # Begin Access control
> ###############################################################################
> #
> ###############################################################################
> # ALLOW: Who we are willing to feed
> allow ANY ^((localhost|loopback)|(127\.0\.0\.1\.?$))
> #
> allow ANY .noaa.gov
You should change the above to "allow ANY \.noaa\.gov$".
> ###############################################################################
> # ACCEPT: Who can feed us, currently this action is only needed for WSI data
> #
> # accept <feedset> <pattern> <hostname pattern>
> ###############################################################################
> # accept anything from yourself
> #
> accept ANY ".*" ^((localhost|loopback)|(127\.0\.0\.1\.?$))
> #
> #
>
> We restarted LDM and tried LDMPING LOCALHOST with the same connection problem
> resulting.
>
> We can't find anything else addressing port 388.
I'm not sure what you mean by the last line above. Please explain.
> Just received your latest note suggesting "netstat -n -a -t | grep 388".
> Here's the results:
>
> [root@ldm-12 ~]# netstat -n -a -t | grep 388
> tcp 0 0 204.227.126.195:388 140.90.64.100:57649
> TIME_WAIT
> tcp 0 0 204.227.126.195:388 198.200.151.151:48070
> TIME_WAIT
> tcp 0 0 204.227.126.195:388 204.228.186.180:35368
> TIME_WAIT
> tcp 0 0 204.227.126.195:388 161.55.224.192:50416
> TIME_WAIT
It looks like you have 4 other LDM systems that requested data from the local
LDM and that just disconnected.
I don't see the LDM server listening on port 388 in the above output (the "-a"
option should have cause it to be listed). Where is it?
> That's all I can think of. We seem to have eliminated network infrastructure
> such as routers, switches, etc with the problem showing up on a ping of
> localhost. Other LDM installs on the same branch of the network work fine.
> LDM configuration, permissions, ownership seems to be okay. All we can see
> is disconnects and connections denied in the log.
I don't know what's going on either. May we log onto one of the computers in
question as the LDM user? At this point, I'm afraid that will be necessary in
order to diagnose the problem in a timely manner.
> Confused and scratching my head,
> Michael
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: LXP-916564
Department: Support LDM
Priority: Normal
Status: Closed