This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
On Thu, 6 Apr 2000, Gregory Grosshans wrote: > Robb, > > Yes the ldmd.log file has been showing multiple dcmetr processes > starting/running. > > Here is some additional information about the K220, K370 and the K220 feeding > the > NOAAPORT data to the first two machines. > > The K220 ingesting the NOAAPORT data and feeding the other K220 (with the > dcmetr > problem) and K370 also runs dcmetr with no problems. This machine is > ingesting the > NWSTG and GOES-E channel into LDM with a modified version of the FSL > acqserver (makes > the NOAAPORT data look like a FOS feed). > > With a short little awk script I totalled the number of products ingested and > total > bytes for the three K machines and each feedtype: > > The K220 with the NWSTG and GOES-E channels ingesting into LDM (as FSL5) has > the > following for a 24 hour period: > FSL5 ALL FEEDS (FSL5,DDS,PPS,AFOS) > 325216 products 418782 products > 5.685 GB of data 5.762 GB of data > > The K370 also has PCWS: > FSL5 ALL FEEDS(FSL5,DDS,PPS,AFOS,PCWS) > 252001 products 345711 products > 4.676 GB of data 4.819 GB of data > > The K220 (with DCMETR problem): > FSL5 ALL FEEDS (FSL5,DDS,PPS,AFOS) > 44536 products 136896 products > 14 MB of data 93 MB of data > > The DDS feed has approx. 45000 products/day and 31 MB/day > The PPS feed has approx. 8100 products/day and 20 MB/day > The AFOS feed has approx. 39000 products/day and 26 MB/day > The PCWS feed has approx. 144 products/day and 65 MB/day. > > Two items are noteworthy. The K370 is using a request FSL5 (^[A-FK-X]) > data.spc.noaa.gov > With this request line it receives 252001 products/day compared to DDS/PPS at > 53000 > products/day. I was under the impression the request line resulted in > something > similar to the DDS/PPS feed. However, it must be ingesting many other > products I don't > know about (e.g. grib, etc.). > > Second, I was surprised at how small the total LDM data ingest was. I > mentioned to the > sys admin that you think the K220 is to slow. This machine is the server > that 8 > operational workstations point to for NAWIPS data. He thinks it may be that > the disks > cant keep up. Right now the particular filesystem has mincache set to > tmpcache > (performance instead of integrity), a while ago he changed this setting to > dsync to > make sure the OS writes the data to disk. When this filesystem change was > done the > load on the system jumped way up (20s) and it was changed back to tmpcache. > So the > disks on this system are constantly busy. > Greg, Maybe my statement that the K220 computer is too slow is too general. It is the disk I/O that is too slow to support the LDM activity and the support of the 8 operational workstations. That's what the pbuf_flush entries in the ldmd.log file are referrring too. Computer environments have transition points from working correctly to not working correctly can be narrow sometimes. I don't know your exact configuration, but I think if you would rearrange the configuration to lesson the disk I/O then the dcmetr decoder would quit exiting. If the dcmetr decoder keeps the high number of pbuf_flush entires you will continue to have the problem. Robb... > With the above information do you still suspect the machine is to slow to > handle the > additional data, even though the total LDM ingest of products is considerably > smaller? > > Thanks, > Gregg > > > Robb Kambic wrote: > > > Greg, > > > > After much discussion, I believe the problem is that the K220 computer is > > too slow to support the dcmetr decoder with the extra FSL5 feed. That's > > why you are seeing all the pbuf_flush entries in the log files. What we > > believe is happening, the dcmetr decoder PIPE is getting filled, so the > > LDM is spawning another dcmetr decoder. SInce both decoders are writing > > to the same file, it's getting a signal 11 and making a corrupt output > > file. You should check the Gempak log file for muliple starts of the > > dcmetr decoder without shutting down messages. Also, if you are just > > using the NOAAport stream the dcmetr like you said it work OK, ie it can > > keep up. > > > > Robb... > > > > ~On Thu, 6 Apr 2000, Gregory Grosshans wrote: > > > > > I recently added a NOAAPORT feed similar to the DDS/PPS feed (i.e. > > > request FSL5 > > > (^[A-FK-X]) ), to a machine running LDM (5.0.9) already receiving the > > > following > > > feedtypes (AFOS, DDS, PPS). Once the NOAAPORT feed started coming in the > > > metar > > > decoder (i.e. dcmetr) began terminating frequently from signal 11 > > > resulting in a > > > corrupted GEMPAK surface observation file. This particular machine is a > > > K220 (2 > > > 100 MHz CPUs with 1 GB RAM). > > > > > > On a K370 (3 200 MHz CPUs and faster with 1.2 GB RAM) the same LDM > > > (5.0.8) setup > > > exists and the resulting surface file is fine and the dcmetr doesn't > > > terminate. > > > > > > Do you have any ideas on why a problem exists on one machine but not on > > > the > > > other? Have you seen anything like this before? I've provided additional > > > information about the two configurations and trouble shooting steps I've > > > already > > > taken below. The NOAAPORT ingest runs on a second K210 server with LDM > > > (5.0.8) > > > running and the FSL acqserver code (modified by SPC/AWC to make the data > > > look like > > > its from the FOS). This second K210 then feeds NOAAPORT data (via the > > > FSL5 > > > feedtype) to the K210 and K370. > > > > > > Thanks, > > > Gregg > > > > > > Some specifics follow: > > > > > > On the K370 (everything works fine) the following configuration follows: > > > > > > ---cut from ldmd.conf--- > > > exec "pqexpire -a .5" > > > request AFOS .* hostname-removed > > > request DDS .* hostname-removed > > > request PPS .* hostname-removed > > > request FSL5 (^[A-FK-X]) hostname-removed This is the NOAAPORT > > > stream > > > received in-house > > > request PCWS "^FSL\.NetCDF\.ACARS\.QC\..*" hostname-removed > > > --- > > > $pq_size = 750000000; > > > > > > The K370 uses dcuair and dcmetr decoders and the FEEDTYPE for each > > > decoder is > > > FSL5|DDS. This allows the receipt of data to be ASAP as well as > > > redundant (in > > > case the FSL5 feed goes down). > > > > > > On the K210 (everything except the metar GEMPAK file works fine) the > > > following > > > configuration follows: > > > > > > ---cut from the ldmd.conf--- > > > exec "pqexpire -a .23333" > > > request AFOS .* hostname-removed > > > request DDS .* hostname-removed > > > request PPS .* hostname-removed > > > #request FSL5 (^[A-FK-X]) hostname-removed > > > # UA data and AVN MOS > > > #request FSL5 (^U[EGHKLPQSZ]|^FE) hostname-removed > > > # SFC/UA data and AVN MOS > > > request FSL5 (^U[EGHKLPQSZ]|^FE|^S[AP]) hostname-removed > > > --- > > > #$pq_size = 750000000; > > > $pq_size = 200000000; > > > > > > The K210 uses dcuair and dcmetr decoders. The dcuair output is fine. > > > The dcmetr > > > output becomes corrupted. > > > > > > Before adding the entire NOAAPORT feed to the K210 it was receiving the > > > UA and AVN > > > MOS data from NOAAPORT just fine. The LDM at this time was 5.0.8. After > > > adding > > > the entire NOAAPORT stream (request ^[A-FK-X]) the dcmetr became > > > corrupted. I > > > then copied the > > > dcmetr decoder from the K370 to the K210 with the GEMPAK surface file > > > still > > > becoming corrupted. Next, I upgraded to LDM-5.0.9 on the K210 but still > > > no > > > success. Then I decreased the request statement to be just upperair, > > > metar and > > > AVN MOS data. However the surface file was still becoming corrupted. > > > The most > > > recent attempt was to lower the queue size from 750 MB to 200 MB (since > > > i'm not > > > ingesting all the NOAAPORT data) but the surface file still becomes > > > corrupted. > > > > > > PBUF messages are still received sporadically, usually soon after > > > starting LDM > > > up: > > > > > > ldmspock@spock> ll ldmd.* > > > -rw-r--r-- 1 ldmspock data 8522013 Apr 6 13:56 ldmd.log > > > -rw-r--r-- 1 ldmspock data 5645263 Apr 6 00:00 ldmd.log.1 > > > -rw-r--r-- 1 ldmspock data 1262077 Apr 5 15:19 ldmd.log.2 > > > -rw-r--r-- 1 ldmspock data 8324510 Apr 5 13:19 ldmd.log.3 > > > -rw-r--r-- 1 ldmspock data 2973459 Apr 5 00:00 ldmd.log.4 > > > -rw-r--r-- 1 ldmspock data 5841177 Apr 4 19:03 ldmd.log.5 > > > -rw-r--r-- 1 ldmspock data 3835 Apr 4 16:25 ldmd.log.6 > > > -rw-r--r-- 1 ldmspock data 26420210 Apr 4 16:23 ldmd.log.7 > > > ldmspock@spock> grep pbuf ldmd.log ldmd.log.1 > > > ldmd.log:Apr 06 06:10:12 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 3.339170 > > > ldmd.log:Apr 06 06:10:23 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 6.926433 > > > ldmd.log:Apr 06 06:10:25 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 2.288729 > > > ldmd.log.1:Apr 05 15:25:05 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 4.764068 > > > ldmd.log.1:Apr 05 15:25:14 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 5.619390 > > > ldmd.log.1:Apr 05 15:25:20 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 3.642364 > > > ldmd.log.1:Apr 05 15:25:35 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 11.463704 > > > ldmd.log.1:Apr 05 15:25:46 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 7.433896 > > > ldmd.log.1:Apr 05 15:25:53 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 5.602335 > > > ldmd.log.1:Apr 05 15:26:12 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 6.995401 > > > ldmd.log.1:Apr 05 15:26:17 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 2.613388 > > > ldmd.log.1:Apr 05 15:26:22 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 3.780494 > > > ldmd.log.1:Apr 05 15:26:34 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 2.924638 > > > ldmd.log.1:Apr 05 15:26:44 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 2.035412 > > > ldmd.log.1:Apr 05 15:30:16 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 3.280656 > > > ldmd.log.1:Apr 05 21:10:08 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 5.369202 > > > ldmd.log.1:Apr 05 23:10:09 spock pqact[17920]: pbuf_flush 35: time elapsed > > > 2.265447 > > > > > > DCMETR processes killed follow. The processes killed during 15Z in > > > ldmd.log.1 > > > were another decoder receiving test products. As you can see there > > > haven't been > > > many child processes killed today, but the file is corrupted. Performing > > > a sflist > > > on the surface obs file resulted in a segmentation fault will the 1Z > > > output was > > > displaying. > > > grep child ldmd.log ldmd.log.1 > > > ldmd.log:Apr 06 04:07:27 spock pqact[17920]: child 5137 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 15:28:49 spock pqact[17920]: child 20923 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 15:28:50 spock pqact[17920]: child 20938 terminated by > > signal 11 > > > > > > ldmd.log.1:Apr 05 15:28:57 spock pqact[17920]: child 21095 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 15:31:52 spock pqact[17920]: child 24263 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 15:31:54 spock pqact[17920]: child 24292 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 15:33:17 spock pqact[17920]: child 25397 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 16:07:30 spock pqact[17920]: child 5674 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 16:13:49 spock pqact[17920]: child 15384 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 16:31:30 spock pqact[17920]: child 24463 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 16:37:15 spock pqact[17920]: child 27998 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 16:57:56 spock pqact[17920]: child 9436 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 17:02:03 spock pqact[17920]: child 10868 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 17:17:29 spock pqact[17920]: child 20707 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 17:20:54 spock pqact[17920]: child 20898 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 17:21:53 spock pqact[17920]: child 22583 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 17:23:35 spock pqact[17920]: child 22924 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 17:27:07 spock pqact[17920]: child 23765 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 17:30:04 spock pqact[17920]: child 25231 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 17:41:46 spock pqact[17920]: child 741 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 17:46:38 spock pqact[17920]: child 1216 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 17:50:08 spock pqact[17920]: child 3376 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 18:01:35 spock pqact[17920]: child 8924 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 18:01:55 spock pqact[17920]: child 9569 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 18:02:08 spock pqact[17920]: child 9736 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 18:04:20 spock pqact[17920]: child 9786 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 18:05:06 spock pqact[17920]: child 10241 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 18:08:47 spock pqact[17920]: child 15683 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 18:14:30 spock pqact[17920]: child 16273 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 18:15:40 spock pqact[17920]: child 17962 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 18:15:44 spock pqact[17920]: child 18700 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 18:16:07 spock pqact[17920]: child 18714 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 18:16:08 spock pqact[17920]: child 18899 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 18:26:21 spock pqact[17920]: child 21652 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 18:26:41 spock pqact[17920]: child 23263 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 18:46:58 spock pqact[17920]: child 28329 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 18:57:50 spock pqact[17920]: child 6435 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 18:58:56 spock pqact[17920]: child 6554 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 19:02:36 spock pqact[17920]: child 7837 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 19:03:07 spock pqact[17920]: child 8165 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 19:03:50 spock pqact[17920]: child 8377 terminated by > > > signal 11 > > > ldmd.log.1:Apr 05 19:08:03 spock pqact[17920]: child 13935 terminated by > > > signal 11 > > > > > > ldmd.log.1:Apr 05 19:09:06 spock pqact[17920]: child 14026 terminated by > > > signal 11 > > > > > > > > > > > > > > > > > > Robb Kambic wrote: > > > > > > > On Thu, 30 Mar 2000, Gregory Grosshans wrote: > > > > > > > > > Can someone explain what a 'pbuf_flash 15' entry in the ldmd.log file > > > > > means? > > > > > > > > > > This LDM ingests the Gateway channel from NOAAPORT (as FSL5) and PPS, > > > > > DDS (note request > > > > > patter below). The ldm queue is 750 MB. > > > > > > > > Greg, > > > > > > > > The pbuf_flush 15 means that the ACTION ie ( FILE, STDIO, PIPE, etc ) > > > > could not write to the destinations because the resource was not > > > > available > > > > at that time or it was blocked. The log entry ususally occurs when disk > > > > activity is at it's peak for some reason. It could be the LDM doing > > > > many > > > > disk writes or another process tying up the disk. > > > > > > > > time elapsed 5.690810 > > > > > > > > means that it took 5.6 sec to complete the activity, > > > > > > > > If this occurs a bunch of time, one could start to loose data. Then > > > > it's > > > > time to determine the reason for the pbuf_flush entries. > > > > > > > > Robb... > > > > > > > > > > Thanks, > > > > > Gregg Grosshans > > > > > SPC > > > > > > > > > > ---information from ldmd.conf--------- > > > > > pqexpire -a .23333 > > > > > request FSL5 (^[A-FK-X]) (hostname was here) > > > > > > > > > > ---information from ldmadmin--------- > > > > > $pq_size = 750000000; > > > > > > > > > > > > > > > ------portion of ldmd.log------------- > > > > > > > > > > Mar 30 15:07:24 spock pqact[14333]: pbuf_flush 15: time elapsed > > > > > 5.690810 > > > > > Mar 30 15:07:24 spock DCMETR[10991]: Normal Termination > > > > > Mar 30 15:07:24 spock DCMETR[11773]: Starting Up > > > > > Mar 30 15:07:31 spock pqact[14333]: pbuf_flush 15: time elapsed > > > > > 6.166595 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > =============================================================================== > > > > Robb Kambic Unidata Program Center > > > > Software Engineer III Univ. Corp for Atmospheric > > > > Research > > > > address@hidden WWW: http://www.unidata.ucar.edu/ > > > > =============================================================================== > > > > > > > =============================================================================== > > Robb Kambic Unidata Program Center > > Software Engineer III Univ. Corp for Atmospheric > > Research > > address@hidden WWW: http://www.unidata.ucar.edu/ > > =============================================================================== > =============================================================================== Robb Kambic Unidata Program Center Software Engineer III Univ. Corp for Atmospheric Research address@hidden WWW: http://www.unidata.ucar.edu/ ===============================================================================