This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Justin, >Date: Thu, 06 Oct 2005 09:23:50 -0400 >From: Justin Cooke <address@hidden> >Organization: NOAA >To: Steve Emmerson <address@hidden> >Subject: Re: "pbuf_flush: time elapsed" problem The above message contained the following: > We completed the installation of LDM 6.4.2.4 yesterday (~15Z) and > things were running well until 01:08Z today, that's when we again > stopped receiving NEXRAD2 data. I did a restart of LDM once I got in > this morning and NEXRAD2 is coming in. > > The TOC Level II radar cluster feeds us the NEXRAD2 data (IP: > 140.90.85.102). > > I have attached the log files from that system; they only seem to have > the standard logging enabled and an LDM restart is in their cron for > twice a day (this has to do with their cluster availability). > > I didn't see anything around the 01:08Z (10/06/05) time that would > indicate why we stopped receiving NEXRAD2 data. Did the outage correspond to a restart of the LDM on 140.90.85.102? If you send USR2 signals to the downstream LDM process that's receiving the NEXRAD2 data-products from 140.90.85.102, then you can put the process into debug mode and see where it's spending its time. Just don't leave it in that mode! :-) > Here is a snippet from our ldmd.log at the time we stopped receiving data: > --- > Oct 06 01:08:25 b2n1 140.90.85.102[1437928] INFO: 10539 > 20051006010822.446 NEXRAD2 386060 L2-BZIP2/KMPX/20051006010419/386/60 > Oct 06 01:08:25 b2n1 pqact[815190] INFO: 10539 20051006010822.446 > NEXRAD2 386060 L2-BZIP2/KMPX/20051006010419/386/60 ASIDE: The above entries indicate that a data-product that has the time 2005-10-06 01:04:19 encoded in it's product-identifier was created at 2005-10-06 01:08:22.446 -- a difference of over 4 minutes. I hope this is OK. > Oct 06 01:08:25 b2n1 pqact[815190] INFO: file: -close > /dcomdev/us007003/ldmdata/obs/upperair/nexrad_level2/KMPX/KMPX_20051006_010419.bz2 > Oct 06 01:08:25 b2n1 pqact[889074] INFO: 3666 20051006010825.900 > FNEXRAD 555 UJXX82 KWBC 060104 > Oct 06 01:08:25 b2n1 pqact[889074] INFO: file: > /dcomdev/us007003/ldmdata/obs/upperair/sonde/20051006.sonde > Oct 06 01:08:25 b2n1 pqact[889074] INFO: pipe: > bin/decod_dcusnd -v 2 -t 600 -d > /dcomdev/us007003/decoder_logs/decod_dcusnd.log > tables/stns/sonde.land.tbl tables/stns/sonde.ship.tbl > /dcomdev/us007003/bufrtab.002 > Oct 06 01:08:25 b2n1 pqact[889074] INFO: 84 20051006010825.900 > FNEXRAD 556 UANT01 KWBC 060105 > Oct 06 01:08:25 b2n1 pqact[889074] INFO: file: > /dcomdev/us007003/ldmdata/obs/upperair/aircraft/20051006.aircraft > Oct 06 01:08:25 b2n1 pqact[889074] INFO: pipe: > bin/decod_dcacft -v 2 -t 300 -d > /dcomdev/us007003/decoder_logs/decod_dcacft.log > tables/stns/pirep.tbl tables/stns/airep.tbl > /dcomdev/us007003/bufrtab.004 > Oct 06 01:08:25 b2n1 pqact[889074] INFO: 194 20051006010825.900 > FNEXRAD 557 UPPK01 OPKC 060000 > Oct 06 01:08:25 b2n1 pqact[889074] INFO: file: > /dcomdev/us007003/ldmdata/obs/upperair/sonde/20051006.sonde > Oct 06 01:08:25 b2n1 pqact[889074] INFO: pipe: > bin/decod_dcusnd -v 2 -t 600 -d > /dcomdev/us007003/decoder_logs/decod_dcusnd.log > tables/stns/sonde.land.tbl tables/stns/sonde.ship.tbl > /dcomdev/us007003/bufrtab.002 > Oct 06 01:08:25 b2n1 pqact[889074] INFO: 144 20051006010825.900 > FNEXRAD 558 SAGR33 LGAT 060100 > Oct 06 01:08:25 b2n1 140.90.85.102[1437928] INFO: 22130 > 20051006010822.509 NEXRAD2 559026 L2-BZIP2/KLWX/20051006005943/559/26 ASIDE: The last entry indicates that a data-product that has the time 2005-10-06 00:59:43 encoded in it's product-identifier was created at 2005-10-06 01:08:22.509 -- a difference of almost 9 minutes. I hope this is OK. > Oct 06 01:08:25 b2n1 pqact[889074] INFO: file: > /dcomdev/us007003/ldmdata/obs/surface/metar/20051006.metar > Oct 06 01:08:25 b2n1 pqact[889074] INFO: pipe: > bin/decod_dcmetr -v 2 -t 300 -d > /dcomdev/us007003/decoder_logs/decod_dcmetr.log > /dcomdev/us007003/bufrtab.000 tables/stns/metar.tbl > Oct 06 01:08:25 b2n1 pqing[208898] INFO: 556 20051006010825.964 > FNEXRAD 579 SAUS70 KWBC 060100 RRE > Oct 06 01:08:25 b2n1 pqing[208898] INFO: 136 20051006010825.964 > FNEXRAD 580 SAUS80 KWBC 060100 RRE > Oct 06 01:08:25 b2n1 pqing[208898] INFO: 140 20051006010825.968 > FNEXRAD 581 SPUS80 KWBC 060104 > Oct 06 01:08:25 b2n1 pqing[208898] INFO: 149 20051006010825.969 > FNEXRAD 582 SASD31 OEJD 060100 RRA > Oct 06 01:08:25 b2n1 pqing[208898] INFO: 72 20051006010825.971 > FNEXRAD 583 ULCM01 FKKD 060000 > Oct 06 01:08:25 b2n1 pqing[208898] INFO: 82 20051006010825.972 > FNEXRAD 584 SAAG00 SACO 060100 > Oct 06 01:08:25 b2n1 pqing[208898] INFO: 1177 20051006010825.974 > FNEXRAD 585 SAIY40 LIIB 060100 > ---- > > The last entry related to NEXRAD2 is this entry into the queue: "Oct 06 > 01:08:25 b2n1 140.90.85.102[1437928] INFO: 22130 20051006010822.509 > NEXRAD2 559026 L2-BZIP2/KLWX/20051006005943/559/26", pqact never acts > on the entry... > > I was poking around the email archives and saw where you suggested to a > user that 4 instances of pqact is sometimes a good idea when working > with the CRAFT data, > > pqact -f CRAFT -p BZIP2/K[A-D] etc/pqact.craft > pqact -f CRAFT -p BZIP2/K[E-K] etc/pqact.craft > pqact -f CRAFT -p BZIP2/K[L-R] etc/pqact.craft > pqact -f CRAFT -p BZIP2/K[S-Z] etc/pqact.craft > > Do you think adding these entries to our ldmd.conf would make any > difference? Processing data-products using multiple, disjoint pqact(1) processes can improve throughput over a single pqact(1) process if the single process has reached the limit on the number of open file-descriptors and has to constantly close file-descriptors in order to open a new ones. You can determine the limit on the number of open file descriptors via the commands "ulimit -a" or "getconf OPEN_MAX". Also, be sure that the pqact(1) configuration-file contains only relevant entries because the pqact(1) program tests every data-product against every entry. Using a faster computer can also help -- as will having sufficient memory so that the relevant processes remain in memory rather than being swapped to disk (the top(1) utility can help determine this). > Thanks again for the help, > > Justin [snip] Regards, Steve Emmerson