[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #CSZ-262645]: errant LDM REQUESTs on sirocco.srcc.lsu.edu
- Subject: [LDM #CSZ-262645]: errant LDM REQUESTs on sirocco.srcc.lsu.edu
- Date: Mon, 28 Jan 2013 13:30:11 -0700
Hi David,
re:
> I noticed one of our email servers was down. Sirocco's ldm uses that email
> server to send us logs. I restarted the email server. Let me know if that
> fixed the persistent 'REQUEST's.
We don't think that this would/could have anything to do with the repeated
REQUESTs from sirocco. One of the primary reasons for this is that we are
now seeing the same behavior from pavan.srcc.lsu.edu -- when I looked a
couple of minutes ago on the node of idd.unidata.ucar.edu that is servicing
REQUESTs from pavan, I saw 43 connections. When I looked 12 minutes ago, there
were 30 REQUESTs. From what I see, pavan should only have one REQUEST line
in its ~ldm/etc/ldmd.conf file, and that is for everything (".*") for
DIFAX|FSL2|UNIWISC.
The 'netstat' output for the connections on our end are telling; here is
a dump that I just did:
/opt/ldm% netstat -a | grep ESTAB | grep pavan
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:36027
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38455
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:36010
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38488
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38893
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38450
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38907
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38501
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50282
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38491
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38900
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38968
ESTABLISHED
tcp 0 260 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:40691
ESTABLISHED
tcp 0 1772352 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50250
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50296
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38453
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38460
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50219
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38486
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50208
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38515
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38494
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38495
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38483
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38465
ESTABLISHED
tcp 0 1772352 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50268
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50242
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38485
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50231
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38451
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38482
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38456
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:35995
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38932
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38952
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38470
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38508
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38476
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38447
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38914
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50276
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:50226
ESTABLISHED
tcp 0 1676784 idd.unidata.uca:unidata-ldm pavan.srcc.lsu.edu:38454
ESTABLISHED
The third column shows the number of bytes in the transmit queue for the machine
listed in the 5th column (pavan). This listing suggests that there is some
"packet shaping" being done on the LSU and/or SRCC side that is limiting/slowing
down the transfer of data from the idd.unidata.ucar.edu cluster node to
SRCC machines. This limit apparently also exists in feed REQUESTs being
serviced by idd.ssec.wisc.edu:
<on idd.ssec.wisc.edu>
unidata2-new.ldm sirocco.srcc.lsu.edu.53969 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53970 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53971 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53972 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53974 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53975 27776 17783 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53976 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53977 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53978 22272 15047 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53979 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53980 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53981 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53982 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53983 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53984 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53985 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53986 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53987 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53988 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53989 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm sirocco.srcc.lsu.edu.53990 19584 13679 492480 0
ESTABLISHED
unidata2-new.ldm pavan.srcc.lsu.edu.38566 443264 0 492480 0
ESTABLISHED
So, the question is what, if anything, has been done to the networking
at LSU and/or SRCC in the past week? I say the past week because there
are indications from the LDM log file (~ldm/logs/ldmd.conf.save) that this
behavior started back in earnest at 14:05:50 on January 22:
Jan 23 14:05:50 sirocco idd.ssec.wisc.edu[16342] ERROR: readtcp(): select()
timeout on socket 4
Jan 23 14:05:50 sirocco idd.ssec.wisc.edu[16342] ERROR: one_svc_run(): RPC
layer closed connection
Jan 23 14:05:50 sirocco idd.ssec.wisc.edu[16342] ERROR: Disconnecting due to
LDM failure; Connection to upstream LDM closed
Jan 23 14:05:50 sirocco idd.ssec.wisc.edu[16342] NOTE: LDM-6 desired
product-class: 20130123130550.236 TS_ENDT {{IDS|DDPLUS, ".*"},{NONE,
"SIG=3d104d7efb0c4abf7b6561887892b21d"}}
Jan 23 14:05:50 sirocco idd.ssec.wisc.edu[16342] NOTE: Upstream LDM-6 on
idd.ssec.wisc.edu is willing to be a primary feeder
Jan 23 14:06:26 sirocco idd.ssec.wisc.edu[16342] ERROR: readtcp(): select()
timeout on socket 4
Jan 23 14:06:26 sirocco idd.ssec.wisc.edu[16342] ERROR: one_svc_run(): RPC
layer closed connection
Jan 23 14:06:26 sirocco idd.ssec.wisc.edu[16342] ERROR: Disconnecting due to
LDM failure; Connection to upstream LDM closed
Jan 23 14:06:26 sirocco idd.ssec.wisc.edu[16342] NOTE: LDM-6 desired
product-class: 20130123130626.548 TS_ENDT {{IDS|DDPLUS, ".*"},{NONE,
"SIG=decdf0a7a61b93a9a0d66997bd1420bf"}}
Jan 23 14:06:26 sirocco idd.ssec.wisc.edu[16342] NOTE: Upstream LDM-6 on
idd.ssec.wisc.edu is willing to be a primary feeder
Jan 23 14:07:02 sirocco idd.ssec.wisc.edu[16342] ERROR: readtcp(): select()
timeout on socket 4
Jan 23 14:07:02 sirocco idd.ssec.wisc.edu[16342] ERROR: one_svc_run(): RPC
layer closed connection
Jan 23 14:07:02 sirocco idd.ssec.wisc.edu[16342] ERROR: Disconnecting due to
LDM failure; Connection to upstream LDM closed
Jan 23 14:07:02 sirocco idd.ssec.wisc.edu[16342] NOTE: LDM-6 desired
product-class: 20130123130702.849 TS_ENDT {{IDS|DDPLUS, ".*"},{NONE,
"SIG=3ea2f1cbdaf12fd9fc60255f536d680e"}}
Jan 23 14:07:02 sirocco idd.ssec.wisc.edu[16342] NOTE: Upstream LDM-6 on
idd.ssec.wisc.edu is willing to be a primary feeder
Jan 23 14:07:39 sirocco idd.ssec.wisc.edu[16342] ERROR: readtcp(): select()
timeout on socket 4
...
The kinds of log messages related to idd.unidata.ucar.edu started in earnest
at 22:46:36 on January 23:
Jan 23 22:46:36 sirocco idd.unidata.ucar.edu[16344] ERROR: readtcp(): select()
timeout on socket 4
Jan 23 22:46:36 sirocco idd.unidata.ucar.edu[16344] ERROR: one_svc_run(): RPC
layer closed connection
Jan 23 22:46:36 sirocco idd.unidata.ucar.edu[16344] ERROR: Disconnecting due to
LDM failure; Connection to upstream LDM closed
Jan 23 22:46:36 sirocco idd.unidata.ucar.edu[16344] NOTE: LDM-6 desired
product-class: 20130123214636.808 TS_ENDT {{IDS|DDS, ".*"},{NONE,
"SIG=60bbe6ac30126de9b05fe9063d0817f7"}}
Jan 23 22:46:46 sirocco idd.unidata.ucar.edu[16344] WARN: Couldn't resolve
"idd.unidata.ucar.edu" to an Internet address in 10.0013 seconds
Jan 23 22:46:46 sirocco idd.unidata.ucar.edu[16344] ERROR: Disconnecting due to
LDM failure; Couldn't get IP address of host idd.unidata.ucar.edu; try again
later
Jan 23 22:46:47 sirocco idd.unidata.ucar.edu[16344] NOTE: LDM-6 desired
product-class: 20130123214647.811 TS_ENDT {{IDS|DDS, ".*"},{NONE,
"SIG=60bbe6ac30126de9b05fe9063d0817f7"}}
Jan 23 22:47:11 sirocco idd.unidata.ucar.edu[16344] NOTE: Upstream LDM-6 on
idd.unidata.ucar.edu is willing to be a primary feeder
Jan 23 22:47:13 sirocco idd.ssec.wisc.edu[16342] ERROR: readtcp(): select()
timeout on socket 4
Jan 23 22:47:13 sirocco idd.ssec.wisc.edu[16342] ERROR: one_svc_run(): RPC
layer closed connection
Jan 23 22:47:13 sirocco idd.ssec.wisc.edu[16342] ERROR: Disconnecting due to
LDM failure; Connection to upstream LDM closed
Jan 23 22:47:21 sirocco idd.unidata.ucar.edu[16344] ERROR: readtcp(): EOF on
socket 4
Jan 23 22:47:21 sirocco idd.unidata.ucar.edu[16344] ERROR: one_svc_run(): RPC
layer closed connection
Jan 23 22:47:21 sirocco idd.unidata.ucar.edu[16344] ERROR: Disconnecting due to
LDM failure; Connection to upstream LDM closed
Jan 23 22:47:23 sirocco idd.unidata.ucar.edu[16344] NOTE: LDM-6 desired
product-class: 20130123214723.495 TS_ENDT {{IDS|DDS, ".*"},{NONE,
"SIG=60bbe6ac30126de9b05fe9063d0817f7"}}
Jan 23 22:47:26 sirocco idd.ssec.wisc.edu[16342] NOTE: LDM-6 desired
product-class: 20130123214726.837 TS_ENDT {{IDS|DDPLUS, ".*"},{NONE,
"SIG=72da2c1a94bd67d40b89e566fcf00d32"}}
Jan 23 22:47:27 sirocco idd.unidata.ucar.edu[16344] NOTE: Upstream LDM-6 on
idd.unidata.ucar.edu is willing to be a primary feeder
Jan 23 22:47:30 sirocco idd.unidata.ucar.edu[16344] ERROR: Broken pipe
Jan 23 22:47:30 sirocco idd.unidata.ucar.edu[16344] ERROR: writetcp(): write()
error on socket 4
Jan 23 22:47:30 sirocco idd.unidata.ucar.edu[16344] ERROR: one_svc_run(): RPC
layer closed connection
Jan 23 22:47:30 sirocco idd.unidata.ucar.edu[16344] ERROR: Disconnecting due to
LDM failure; Connection to upstream LDM closed
Jan 23 22:47:30 sirocco idd.unidata.ucar.edu[16344] NOTE: LDM-6 desired
product-class: 20130123214730.496 TS_ENDT {{IDS|DDS, ".*"},{NONE,
"SIG=13138cb3ac4dfc050edc8b5135a17d2d"}}
Jan 23 22:47:31 sirocco idd.ssec.wisc.edu[16342] NOTE: Upstream LDM-6 on
idd.ssec.wisc.edu is willing to be a primary feeder
Jan 23 22:47:51 sirocco idd.ssec.wisc.edu[16342] ERROR: Broken pipe
Jan 23 22:47:51 sirocco idd.ssec.wisc.edu[16342] ERROR: writetcp(): write()
error on socket 4
Jan 23 22:47:51 sirocco idd.ssec.wisc.edu[16342] ERROR: one_svc_run(): RPC
layer closed connection
Jan 23 22:47:51 sirocco idd.ssec.wisc.edu[16342] ERROR: Disconnecting due to
LDM failure; Connection to upstream LDM closed
Jan 23 22:47:53 sirocco idd.unidata.ucar.edu[16341] ERROR: readtcp(): select()
timeout on socket 4
Jan 23 22:47:53 sirocco idd.unidata.ucar.edu[16341] ERROR: one_svc_run(): RPC
layer closed connection
Jan 23 22:47:53 sirocco idd.unidata.ucar.edu[16341] ERROR: Disconnecting due to
LDM failure; Connection to upstream LDM closed
Jan 23 22:47:53 sirocco idd.unidata.ucar.edu[16341] NOTE: LDM-6 desired
product-class: 20130123214753.275 TS_ENDT {{WMO, ".*"},{NONE,
"SIG=840ee3c0a8c5a9e69d571521dc9a3f39"}}
Jan 23 22:48:06 sirocco idd.unidata.ucar.edu[16344] NOTE: Upstream LDM-6 on
idd.unidata.ucar.edu is willing to be a primary feeder
Jan 23 22:48:10 sirocco idd.unidata.ucar.edu[16341] NOTE: Upstream LDM-6 on
idd.unidata.ucar.edu is willing to be a primary feeder
...
Have you heard back from LSU and/or SRCC network admins about anything
that may have been done in this timeframe?
Cheers,
Tom
--
****************************************************************************
Unidata User Support UCAR Unidata Program
(303) 497-8642 P.O. Box 3000
address@hidden Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage http://www.unidata.ucar.edu
****************************************************************************
Ticket Details
===================
Ticket ID: CSZ-262645
Department: Support LDM
Priority: Normal
Status: Open