[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #WDX-973084]: Exit status 1 of what? :)
- Subject: [LDM #WDX-973084]: Exit status 1 of what? :)
- Date: Mon, 04 Feb 2008 10:12:32 -0700
Hi Daryl,
> Annoying me again. Previously, I bugged you about slow pipes not
> reporting what process it was:
>
> http://www.unidata.ucar.edu/support/help/MailArchives/ldm/msg04879.html
>
> Thanks for implementing this, hopefully others found it useful.
>
> Now, I am trying to figure out which of my buggy decoders is exiting
> badly. As my logs are filling with this:
>
> Feb 04 16:21:36 mesonet pqact[1938] NOTE: child 2155 exited with status 1
> Feb 04 16:26:16 mesonet pqact[1938] NOTE: child 8102 exited with status 1
> Feb 04 16:35:39 mesonet pqact[1938] NOTE: child 18758 exited with status 1
> Feb 04 16:36:58 mesonet pqact[1938] NOTE: child 20265 exited with status 1
>
> So I do the -USR2 to pqact, but the logs I get are not inuitive as to
> which product going to which processor is actually erroring out. The
> child PIDs are not included in the logs, unless I am missing something?
> For example:
>
> Feb 04 14:57:41 mesonet pqact[32073] INFO: 115 20080204145112.042
> IDS|DDPLUS 119265941 SPCN46 CWAO 041446
> Feb 04 14:57:41 mesonet pqact[32073] INFO: pipe: dcmetr
> -b 9 -m 72 -s /mesonet/TABLES/awos.stns -d logs/dcmetr_awos.log -a 0
> /mesonet/data/gempak/awos/YYMMDD_awos.gem
> Feb 04 14:57:41 mesonet pqact[32073] INFO: pipe: dcmetr
> -b 9 -m 72 -s /mesonet/TABLES/mesonet4.stns -d logs/dcmetr_meso1.log
> -a 0 /mesonet/data/gempak/meso/YYMMDD_meso.gem
> Feb 04 14:57:41 mesonet pqact[32073] INFO: pipe: dcmetr -b
> 9 -m 72 -s /mesonet/TABLES/asos.stns -d logs/dcmetr_asos.log -a 0
> /mesonet/data/gempak/asos/YYMMDD_asos.gem
> Feb 04 14:57:41 mesonet pqact[32073] NOTE: child 27014 exited with status
> 1
>
>
> Looking at the source (at least trying to), I see a case where child
> exiting with some status may not print out the process name. I tried to
> diagnose how this happens, but only confused myself.
>
> Any comments on this?
Because no command-line was printed by "pqact", the child process was
either due to an EXEC entry in the "pqact" configuration-file or it was
due to a PIPE entry and "pqact" closed the pipe because it needed a
file-descriptor for a new process and nothing had been written to that
pipe for the longest time (closing a pipe removes the associated entry
from an internal list with the consequent loss of the command-line).
Can you have your decoders write a "Starting up" message to the LDM
log file? This would allow you to match-up the PID-s.
> thanks!
> daryl
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: WDX-973084
Department: Support LDM
Priority: Normal
Status: On Hold