[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: hung downstream LDM
- Subject: Re: hung downstream LDM
- Date: Mon, 14 Nov 2005 09:07:46 -0700
Justin,
>Date: Mon, 14 Nov 2005 10:01:49 -0500
>From: Justin Cooke <address@hidden>
>Organization: NOAA
>To: Steve Emmerson <address@hidden>
>Subject: Re: hung downstream LDM
The above message contained the following:
> Well we are now at 12 days (288 hours) without any stoppage of the
> NEXRAD2 feed :).
Excellent!
> But we are seeing a few odd things in our ldmd.log, we are getting
> several "broken_pipe" errors which then cause a write error. I passed
> this along to our main decoder developer (Jeff Ator) and here is his
> response:
> ---
> Hmm, this is interesting, and it looks like it's something peculiar to
> this new LDM 6.4.3.0 that went in on 11/2/05, because these same
> "Broken_pipe" messages aren't showing up in the LDM log files prior to
> that date (i.e. ldmd.log.15, ldmd.log.16, ..., ldmd.log.20). What's
> really interesting is that these messages correspond exactly to when a
> decoder starts up, i.e. when a bulletin comes in for a particular
> decoder that isn't already running, and therefore pqact needs to start
> it up by forking a child process. In other words, the first bulletin
> that causes the new decoder process to start up is also the same one
> that is generating the "Broken_pipe" message in the logs, and it's
> happening for all of the decoders! Now this isn't a problem per-se,
> because these same bulletins are actually getting into the respective
> decoders, as I could confirm within the actual decoder logs themselves
> (perhaps this is due to the "pipe_prodput: trying again" completing
> successfully, as you pointed out below(?)). Either way, it's a bit
> unnerving, not to mention misleading, to suddenly be seeing all of these
> "Broken_pipe" messages, especially since they weren't occurring prior to
> the installation of the new LDM 6.4.3.0 build.
> ----
>
> Here are a few of the log file entires:
> ---
> Nov 14 05:03:15 b2n1 pqact[434350] INFO: pipe:
> bin/decod_dcrast -v 2 -t 600 -d
> /dcomdev/us007003/decoder_logs/decod_dcrast.log
> /dcomdev/us007003/bufrtab.FSL_RAST /dcomdev/us007003/bufrtab.002
> Nov 14 05:03:15 b2n1 pqact[434350] ERROR: pbuf_flush (27) write: Broken pipe
> Nov 14 05:03:15 b2n1 pqact[434350] ERROR: pipe_put:
> bin/decod_dcrast-v2-t600-d/dcomdev/us007003/decoder_logs/decod_dcrast.log/dcomdev/us007003/bufrtab.FSL_RAST/dcomdev/us007003/bufrtab.002
>
> write error
> Nov 14 05:03:15 b2n1 pqact[434350] ERROR: pipe_prodput: trying again
> ...
> Nov 14 05:13:42 b2n1 pqact[434350] INFO: pipe:
> bin/decod_dccgrd -v 2 -t 300 -d
> /dcomdev/us007003/decoder_logs/decod_dccgrd.log
> /dcomdev/us007003/bufrtab.001 tables/stns/cg.tbl
> Nov 14 05:13:42 b2n1 pqact[434350] ERROR: pbuf_flush (36) write: Broken pipe
> Nov 14 05:13:42 b2n1 pqact[434350] ERROR: pipe_put:
> bin/decod_dccgrd-v2-t300-d/dcomdev/us007003/decoder_logs/decod_dccgrd.log/dcomdev/us007003/bufrtab.001tables/stns/cg.tbl
>
> write error
> Nov 14 05:13:42 b2n1 pqact[434350] ERROR: pipe_prodput: trying again
> ---
>
> I just stumbled on this when doing a grep for "error" in ldmd.log even
> though we've had no reported problems.
>
> Any ideas?
Not yet. I'll investigate.
> Thanks again for all the attention you have given this,
>
> Justin
Regards,
Steve Emmerson