[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #WDX-973084]: Exit status 1 of what? :)



Hi Daryl,

> Annoying me again.  Previously, I bugged you about slow pipes not
> reporting what process it was:
> 
> http://www.unidata.ucar.edu/support/help/MailArchives/ldm/msg04879.html
> 
> Thanks for implementing this, hopefully others found it useful.
> 
> Now, I am trying to figure out which of my buggy decoders is exiting
> badly.  As my logs are filling with this:
> 
> Feb 04 16:21:36 mesonet pqact[1938] NOTE: child 2155 exited with status 1
> Feb 04 16:26:16 mesonet pqact[1938] NOTE: child 8102 exited with status 1
> Feb 04 16:35:39 mesonet pqact[1938] NOTE: child 18758 exited with status 1
> Feb 04 16:36:58 mesonet pqact[1938] NOTE: child 20265 exited with status 1
> 
> So I do the -USR2 to pqact, but the logs I get are not inuitive as to
> which product going to which processor is actually erroring out.  The
> child PIDs are not included in the logs, unless I am missing something?
> For example:
> 
> Feb 04 14:57:41 mesonet pqact[32073] INFO:      115 20080204145112.042
> IDS|DDPLUS 119265941  SPCN46 CWAO 041446
> Feb 04 14:57:41 mesonet pqact[32073] INFO:                pipe: dcmetr
> -b  9 -m 72 -s /mesonet/TABLES/awos.stns  -d logs/dcmetr_awos.log -a 0
> /mesonet/data/gempak/awos/YYMMDD_awos.gem
> Feb 04 14:57:41 mesonet pqact[32073] INFO:                pipe: dcmetr
> -b 9 -m 72 -s /mesonet/TABLES/mesonet4.stns      -d logs/dcmetr_meso1.log
> -a 0        /mesonet/data/gempak/meso/YYMMDD_meso.gem
> Feb 04 14:57:41 mesonet pqact[32073] INFO:                pipe: dcmetr -b
> 9 -m 72 -s /mesonet/TABLES/asos.stns  -d logs/dcmetr_asos.log -a 0
> /mesonet/data/gempak/asos/YYMMDD_asos.gem
> Feb 04 14:57:41 mesonet pqact[32073] NOTE: child 27014 exited with status
> 1
> 
> 
> Looking at the source (at least trying to), I see a case where child
> exiting with some status may not print out the process name.  I tried to
> diagnose how this happens, but only confused myself.
> 
> Any comments on this?

Because no command-line was printed by "pqact", the child process was
either due to an EXEC entry in the "pqact" configuration-file or it was
due to a PIPE entry and "pqact" closed the pipe because it needed a
file-descriptor for a new process and nothing had been written to that
pipe for the longest time (closing a pipe removes the associated entry
from an internal list with the consequent loss of the command-line).

Can you have your decoders write a "Starting up" message to the LDM
log file?  This would allow you to match-up the PID-s.

> thanks!
> daryl

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: WDX-973084
Department: Support LDM
Priority: Normal
Status: On Hold