[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #PAU-308840]: ldm exiting
- Subject: [LDM #PAU-308840]: ldm exiting
- Date: Mon, 12 Nov 2012 08:51:36 -0700
Hi Heather,
re:
> Tom, thank you very much for returning my email.
No worries.
re:
> I went in stopped the ldm, deleted and made a new queue. I
> will make sure to do this next time something happens with the queue.
Very good.
re:
> Is this at all preventable?
It shouldn't have happened in the first place. We have been ingesting
NOAAport on a number of machines for a LONG time (several years), and
we only have experienced this problem once or twice (and never on some
machines).
What caused your NOAAPort ingest process to seg fault is a mystery;
perhaps you got a slug of bad data in the broadcast that the
process simply couldn't handle? Again, this is a rare occurrence.
> Thanks!
>
> Heather Kiley
> ________________________________________
> From: Unidata LDM Support [address@hidden]
> Sent: Monday, November 12, 2012 10:26 AM
> To: Kiley, Heather L (IS)
> Cc: address@hidden
> Subject: EXT :[LDM #PAU-308840]: ldm exiting
>
> Hi Heather,
>
> re:
> > The ldm stopped unexpectedly on my noaap ingestor yesterday. When I
> > tried to restart it using "ldmadmin start" I got this message:
> >
> > The writer-counter of the product-queue isn't zero. Either a process
> > has the product-queue open for writing or the queue might be corrupt.
> > Terminate the process and recheck or use
> >
> > pqcat -l- -s -q /usr/local/ldm/var/queues/ldm.pq && pqcheck -F -q
> > /usr/local/ldm/var/queues/ldm.pq
> >
> > to validate the queue and set the writer-counter to zero.
> > LDM not started
>
> This indicates that the LDM queue got damaged somehow. The suggested
> action to take is, in fact, one of two alternatives. The second
> alternative is the best one for a NOAAPort ingest machine:
> delete and remake the LDM queue:
>
> <as 'ldm' on the machine having problems>
> ldmadmin stop
> ldmadmin delqueue
> ldmadmin mkqueue
> ldmadmin start
>
> re:
> > I rebooted my machine in an attempt to clean up the queue, but I got
> > the same message again when I tried to restart the ldm.
>
> Once the queue is damaged, reboots will have no effect; it will stay
> damaged until fixed or remade.
>
> re:
> > I issued the command given in the error message:
> >
> > pqcat -l- -s -q /usr/local/ldm/var/queues/ldm.pq && pqcheck -F -q
> > /usr/local/ldm/var/queues/ldm.pq
> >
> > And then I was able to restart the ldm.
>
> OK. For future reference: on NOAAPort ingest machines, I would simply
> delete and remake the queue as per the info I included above. It is
> simpler, probably quicker and more foolproof.
>
> re:
> > Do you have any idea what may
> > have happened to cause the ldm to stop?
> >
> > Here is the error message in my log before the ldm stopped:
> > Nov 11 08:09:37 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:09:37 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:09:44 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 08:09:44 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 08:10:05 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:10:05 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:10:30 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 08:10:31 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: child 3284 terminated by signal
> > 11: noaaportIngester -m 224.0.1.3
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Killing (SIGTERM) process group
> > Nov 11 16:28:35 noaapnew noaapxcd(feed)[3298] NOTE: Exiting
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Exiting
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Terminating process group
>
> 'signal 11' indicates a segmentation violation. Why this happened is
> not readily apparent.
>
> re:
> > I would appreciate any advice.
>
> I think that the expedient thing to do is/was delete and remake the LDM
> queue.
>
> Cheers,
>
> Tom
> --
> ****************************************************************************
> Unidata User Support UCAR Unidata Program
> (303) 497-8642 P.O. Box 3000
> address@hidden Boulder, CO 80307
> ----------------------------------------------------------------------------
> Unidata HomePage http://www.unidata.ucar.edu
> ****************************************************************************
>
>
> Ticket Details
> ===================
> Ticket ID: PAU-308840
> Department: Support LDM
> Priority: Normal
> Status: Closed
>
>
>
Cheers,
Tom
--
****************************************************************************
Unidata User Support UCAR Unidata Program
(303) 497-8642 P.O. Box 3000
address@hidden Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage http://www.unidata.ucar.edu
****************************************************************************
Ticket Details
===================
Ticket ID: PAU-308840
Department: Support LDM
Priority: Normal
Status: Closed