[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: pq messages in log at UCSD
- Subject: Re: pq messages in log at UCSD
- Date: Tue, 08 Jan 2002 12:19:50 -0700
Russ Rew wrote:
>
> Anne,
>
> > Today I'm looking at the logs and I see some unusual messages:
> >
> > Jan 08 12:19:32 aeolus motherlode[744]: FEEDME(motherlode.ucar.edu): OK
> > Jan 08 12:19:33 aeolus motherlode[744]: pq_del_oldest: signature
> > 54c2cdcf6960020caf2ebdec373910cb: Not Found
> > Jan 08 12:19:33 aeolus motherlode[744]: hereis: pq_insert failed:
> > Invalid argument: b84f91c2b3adff27f7cd2f9943e2f18f 4432
> > 20020108121902.472 NNEXRAD 61479154 SDUS20 PHFO 081216 /pN2SHWA
> >
> > This occured twice in the past 5 hours. He's running 5.1.4, which has
> > your changes regarding the pq_del_oldest conflict. I guess these
> > messages originated from the rpc.ldmd that's receiving from motherlode,
> > but the messages must apply to the local queue. And, here's what pqmon
> > is reporting:
> >
> > aeolus.ucsd.edu> pqmon
> > Jan 08 17:38:43 pqmon: Starting Up (2014)
> > Jan 08 17:38:43 pqmon: nprods nfree nempty nbytes maxprods
> > maxfree minempty maxext age
> > Jan 08 17:38:43 pqmon: 100275 1 82829 750001128 161069
> > 2 22035 1048 10105
> > Jan 08 17:38:43 pqmon: Exiting
> >
> > Otherwise things seem ok. Do you have any ideas about what might have
> > occurred?
>
> No, I'm not sure. I haven't seen this before, but it might be a
> symptom of a corrupted product queue. The "pq_insert failed:" message
> is just a consequence of pq_del_oldest failing.
>
> Whenever a product is inserted in the queue, it's MD5 signature is
> inserted into a hash table for quickly checking on duplicate
> products. Later when it's time to delete the product to make room for
> a new product, the signature must be deleted from the hash table. In
> this case, the signature that was supposedly added to the hash table
> earlier is not found, so it can't be deleted. This should never
> happen, so it indicates either a bug, a corrupted queue, or a disk or
> memory error. Once there is one of these errors, there are likely to
> be more, if the hash table data structures are hosed. It is dropping
> a product every time it encounters this problem.
>
> Sounds like it might be time to restart the LDM with a new queue and
> see if it happens again. I'd also be interested if this error message
> has ever occurred in LDM logs on motherlode or other machines ...
>
> --Russ
Russ,
The problem only happened twice 7 hours ago (and they were within 30
seconds of each other). Otherwise things seem fine, as best I can
tell. I think I'll just let it go for a while.
I've never seen these messages before. I just scanned the logs on
motherlode and none have occurred in the past four days. I'll let you
know if I see it again.
Anne