[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20040920: Possible pqact issue in LDM?
- Subject: 20040920: Possible pqact issue in LDM?
- Date: Mon, 20 Sep 2004 10:31:28 -0600
Steven,
>Date: Mon, 20 Sep 2004 10:12:07 -0500
>From: "Steven Danz" <address@hidden>
>Organization: Aviation Weather Center
>To: Steve Emmerson <address@hidden>
>Subject: Re: 20040918: Possible pqact issue in LDM?
>Keywords: 200409091803.i89I3pnJ023109
The above message contained the following:
> Sure... the story goes something like this.
>
> AWC has a NorthupGrumman NOAAPort receiver system, which is pretty
> much just a stripped down AWIPS CP. On this system, we have some
> software from FSL that can talk to the AWIPS CP software and for each
> product received on the NOAAPort, insert it into the LDM queue. So,
> we also have LDM running on this system, configured as a pure data
> source (no 'request' lines in ldmd.conf) to feed the NOAAPort data to
> other systems in the center. Now, to make a record of the time that
> each product reaches the center on NOAAPort, the LDM on the receiver
> has a small pqact.conf that, for each AWC product, EXEC's a script to
> put a one-line product in the queue that contains the current wall
> clock time, the server name, product name, etc. to give us a record of
> the time that the product arrived from NOAAPort.
>
> Now, down stream from the NOAAPort receiver, there is an LDM client
> with a pqact configured that stores all these 'receive notification'
> in to a file by product, by day. We also keep a similar log of every
> transmit of every product from the center. Then, we have a script
> that takes the send log entries and matches them up with the receive
> log entries to determine delay and to monitor if the NWSTG drops a
> product When ever there is a missing receive entry that is 'too old',
> an alarm goes up on our monitoring software (Nagios is the package we
> are using). So, when there is an alarm on Nagios (and I catch it in
> time before things are flushed from the queue) I quickly log into the
> NOAAPort receiver to check
> 1) is the product in the queue
> 2) is the receive notice in the queue
> 3) is there a log entry from the receive notice script that it attempted
> to put a notice in the queue
> 4) and when I was running pqact -v, was there an entry that pqact saw
> the product go by
>
> So far, each time there has been a problem reported 1) has been fine,
> the product was in the queue, but 2) was not and there was no entry
> in 3) indicating that the script had attempted to run. When I was
> running 'pqact -v' over the weekend I noticed that there were 'chunks'
> of headers missing when comparing the list of headers to what 'pqcat'
> displayed in the queue. For example, looking over about 40 minutes
> of the queue, there were about 255 products in 13 'chunks' that pqcat
> listed in the queue, that the 'pqact -v' didn't report seeing.
>
> Probably too much detail :-)
Not at all.
Are you checking the product-queue too soon after being notified? Is
the missed data-product later acted-upon by pqact(1), indicating that it
was merely delayed?
Do you have a saved product-queue that pqcat(1) indicates contains
data-products that pqact(1) missed?
If so, if you manually execute pqact(1) on this product-queue, does it
find the "missed" data-products, e.g.,
echo '<<feedtype>> (<<pattern>>) EXEC -wait echo \1' >conf
pqact -vl- -o <<time>> -q <<pq>> conf
where
<<feedtype>> Is the feedtype of a data-product that pqact(1)
missed.
<<pattern>> Is the pattern of a data-product that pqact(1)
missed.
<<time>> Is the age of the oldest data-product in the
product-queue in seconds (use pqmon(1) to
determine this).
<<pq>> Is the pathname of the saved product-queue.
Are there non-printing characters in the product-identifier of the
"missed" data products that cause them to not be matched? You can check
the product-identifiers with
pqcat -vl- -f <<feedtype>> -p <<pattern>> -q <<pq>> -i 0 | od -c
Regards,
Steve Emmerson