This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>To: <address@hidden> >From: "Arthur A. Person" <address@hidden> >Subject: Unable to maintain connects from wsi >Organization: UCAR/Unidata >Keywords: 200108221513.f7MFDR125002 Art, > ... I took your suggestion and remade the ldm queue > and that fixed the connection problem to wsi. This appears to me to be a > bug somewhere... in the ldm queue management? Or perhaps RedHat? It > appears that something caused the queue to become corrupt in some fashion > during normal operation of the ldm such that the ldm didn't notice much > and didn't prevent most of its operation. However, wsi would never > connect and I'm waiting to see if our NMC2 reception improves as that's > been flaky as well. Any ideas on what would cause the queue to corrupt? > I'm concerned this may happen again. I still have the old queue if someone > wants to look at it. The ldm queue management library is filled with assertion checks that are intended to catch queue corruption or queue data structure inconsistency at the beginning of every operation on the queue and often at the conclusion of a queue operation as well. If the queue was corrupted somehow, it seems more likely that one of the many (fatal) assertion violation messages would appear in the log files just before the ldm process exited, rather than the problem causing a slow down. At least that was my experience during testing and debugging of the pq library. But id you still have the old queue available, could you possibly do me a favor by sending me the output of a couple of additional checks for queue corruption? Assuming the old product queue is in a file named "old.pq", the first test is getting the output from pqmon: $ pqmon -q old.pq Aug 24 17:32:08 pqmon: Starting Up (13705) Aug 24 17:32:08 pqmon: nprods nfree nempty nbytes maxprods maxfree minempty maxext age Aug 24 17:32:08 pqmon: 3314 1 21099 49698472 3314 1 21099 50309464 20560600 Aug 24 17:32:08 pqmon: Exiting As above, expect 4 lines of output, the 2nd and 3rd of which are long. Please send these; they're explained in the pqmon man page, if you're interested. The second thing I'd like to see is pqcat's idea of how many products are in the queue, and how long it takes to go through all these. You get this by running something like $ pqcat -q old.pq > /dev/null Aug 24 18:43:39 pqcat: Starting Up (13752) Aug 24 18:43:40 pqcat: Exiting Aug 24 18:43:40 pqcat: Number of products 3314 If either of these programs dies or emits an error message, then the queue really was corrupt, and the problem requires further investigation. We've never seen a corrupted queue with versions 5.1.2, 5.1.3, or 5.1.4 on motherlode, which has been feeding dozens of sites millions of products for months, so if there is a problem, it may be Linux-specific ... --Russ _____________________________________________________________________ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu