[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #NQY-760567]: Connection Error Message in ldmd.log
- Subject: [LDM #NQY-760567]: Connection Error Message in ldmd.log
- Date: Tue, 02 Oct 2012 09:36:03 -0600
Hiro,
> Full Name: Hiro Gosden
> Email Address: address@hidden
> Organization: CIRA
> Package Version: 6.6.5
> Operating System: RHEL 4
> Hardware: Workstation
> Description of problem: I'm getting a lot of status 6 & 7 error messages that
> indicate "Broken Pipe" and "Couldn't flush connection," then "connection rest
> by peer." Some times, it seems to tie-up the network port and grinds the
> system to a halt. The system crash doesn't happen often any more, but a
> couple of weeks ago, it crashed almost every day. Would you know what may be
> causing this? Thanks,
The log file contained messages like this:
Sep 11 00:00:20 awips rhesrv18.spc.noaa.gov(feed)[11208] NOTE: feed or notify
failure; Error sending BLKDATA: RPC: Unable to send; errno = Broken pipe
Sep 11 00:00:20 awips rpc.ldmd[17742] NOTE: child 11208 exited with status 7
Sep 11 00:00:21 awips rhesrv18.spc.noaa.gov(feed)[11803] NOTE: Starting
Up(6.6.5/6): 20120910230019.884 TS_ENDT {{EXP, ".*"}},
SIG=2d74e065bf62deb5a2aa439f04d393d6, Primary
Sep 11 00:00:21 awips rhesrv18.spc.noaa.gov(feed)[11803] NOTE: topo:
rhesrv18.spc.noaa.gov {{EXP, (.*)}}
Sep 11 00:05:50 awips rhesrv18.spc.noaa.gov(feed)[13004] NOTE: Starting
Up(6.6.5/6): 20120910230549.042 TS_ENDT {{EXP, ".*"}},
SIG=e7b7e7b35d9b954bcb2b09f0e5cd9ee3, Alternate
Sep 11 00:05:50 awips rhesrv18.spc.noaa.gov(feed)[13004] NOTE: topo:
rhesrv18.spc.noaa.gov {{EXP, (.*)}}
Sep 11 00:06:19 awips rhesrv18.spc.noaa.gov(feed)[11803] ERROR: Couldn't flush
connection; nullproc_6() failure to rhesrv18.spc.noaa.gov: RPC: Unable to
receive; errno = Connection reset by peer
Sep 11 00:06:19 awips rpc.ldmd[17742] NOTE: child 11803 exited with status 6
Sep 11 00:15:10 awips rhesrv18.spc.noaa.gov(feed)[8289] NOTE: feed or notify
failure; HEREIS: RPC: Unable to send; errno = Broken pipe
Sep 11 00:15:10 awips rpc.ldmd[17742] NOTE: child 8289 exited with status 7
Sep 11 00:15:43 awips rhesrv18.spc.noaa.gov(feed)[14907] NOTE: Starting
Up(6.6.5/6): 20120910231542.575 TS_ENDT {{EXP, ".*"}},
SIG=0b19cf4b14f15e3a8814ec1ba8608b22, Primary
Sep 11 00:15:43 awips rhesrv18.spc.noaa.gov(feed)[14907] NOTE: topo:
rhesrv18.spc.noaa.gov {{EXP, (.*)}}
Sep 11 00:16:12 awips rhesrv18.spc.noaa.gov(feed)[13004] ERROR: Couldn't flush
connection; nullproc_6() failure to rhesrv18.spc.noaa.gov: RPC: Unable to
receive; errno = Connection reset by peer
Sep 11 00:16:12 awips rpc.ldmd[17742] NOTE: child 13004 exited with status 6
The messages are due to the downstream LDM processes switching between PRIMARY
and ALTERNATE modes and may safely be ignored. Indeed, the ERROR level of the
"Couldn't flush" message is demoted to NOTICE in the current LDM release.
This is the way the LDM is designed to work. Unfortunately, it results in many
log messages.
While individual upstream LDM processes might terminate, the LDM system as a
whole shouldn't crash, tie-up, or lock. Please send me any evidence of this
happening.
> Hiro
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: NQY-760567
Department: Support LDM
Priority: Normal
Status: Closed