This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>To: address@hidden >From: Gregory Grosshans <address@hidden> >Subject: Characterization of data load on LDM and predicting impacts on LDM >queue load >Organization: NOAA/SPC >Keywords: 200111022252.fA2MqI102912 LDM performance Hi Gregg, > Can you tell me if there has been any work done on trying to predict the > load characterization on LDM, in particular the queue, in relation to > the type of workstation, disk system or systems being written to, and > the volume of products being received into the queue? If so, I'd > appreciate gleaning any of this information. Well, we've done some testing as part of developing the product queue algorithms, to make sure all the product insertion, deletion, and region management algorithms run in O(log(n)) time, where n is the number of products currently in the queue. And we've tested queue insertion rates in the steady state (with a full product queue) with small products and a realistic mix of product sizes. The results of this testing are in the IIPS paper from last years AMS meeting, which you can also see at http://www.unidata.ucar.edu/staff/russ/papers/ldm51.fm5.pdf or http://www.unidata.ucar.edu/staff/russ/papers/ldm51.html in PDF or HTML form. We ran these tests over a fast local network on relatively slow Sun workstations (300 Mhz Sparcs), with the product queue memory-mapped to a file on local disk. We haven't done extensive load testing on other platforms, but the IDD community is running the LDM 5.1.2 and later versions on a wide variety of platforms, and we haven't heard complaints about the product queue performance. > Briefly, the SPC has two HP J5000 workstations. Workstation 'A' has a > HP fiber channel disk system with all LDM data including the LDM queue > written to the fiber channel disks. Workstation 'B' has two LVD SCSI > drives mirrored. The LDM queue is written to the SCSI drives while all > data handled via 'FILE' and 'EXEC' in the pqact ends up being written to > the NetAPP NFS filer. Both systems have 3 GB of RAM and the local file > systems use lots of cache and are configured for high performance at the > sacrifice of data integrity. On both systems the LDM queue is 750 MB. > > Both systems receive a NOAAPORT feed consisting of GOES-EAST, GOES-WEST > and the NWSTG channel. The feedtype for this stream is WMO and NNEXRAD > for the radar products. In addition a local Unisys WeatherMAX radar > server is injecting several national mosaic radar products and about 5 > products from every radar site into the LDM queue as a feedtype of WSI. > Also, as a backup to the NOAAPORT feed there is an X.25 feed into each > workstation from an upstream host (i.e. checkov), as a feedtype of > WMO|SPARE. The X.25 feed is approximately 56 Kbps. The PCWS and FSL2 > streams are acars and 6-minute profiler data. The EXP is the MESOWEST. > The second WSI feed is a backup of the National Mosaic radar products > from the AWC. > > When we transitioned to these systems about 18 months ago everything > worked fine. However, over time I've noticed workstation 'B', with the > NetAPP, to have more and more pbuf_flush log messages (see below). So > far I haven't encountered any 'pipe broken' messages with the pbuf_flush > log entries. > > Workstation 'B' also receives a significant amount of model data from > NCEP Headquarters via two T-1s. Most of the data is in GEMPAK format, > but META files are created, BUFR is converted to GEMPAK GRIDS for each > ETA and NGM cycle. Thus, the machine is also writing a lot to the > NetAPP NFS server. > > We will be transitioning to some different hardware over the next 1-2 > months (i.e. a J6000 and faster NetAPP NFS box). Also, 11 more radar > products from each NEXRAD site will begin flowing into NOAAPORT, and > into the LDM queue, later this month. At 11 products per site this > equates to over 1500 products per 6 minutes or 15000+ products per > hour. > > Can you tell me if there is anyway to determine ahead of time what type > of impact on the system and/or LDM one can expect with the addition of > more data (e.g. 11 products from each site)? Is there anyway to > characterize what type of load a given set of hardware (e.g. > workstation, disks, etc.) and data flow into the LDM queue will have on > a system? It's difficult to develop any kind of analytical model that's very accurate, because the memory management of the list of free regions in the queue is based on algorithms that perform well in practice but that are not amenable to analytical treatment. I can tell you that decoding and filing products to a remote NFS server may be more of a limitation than the product queue. The addition of 11 product per site is about a 150% to 200% increase in the number of radar products, but that may not be significant considering all the other kinds of products you are handling. Our tests showed that the LDM on relatively slow Sun could handle bursts of 300 products per second even with concurrent garbage collection without adding significant latency, but occasionally there may be a several second pause in deleting enough old products out of the queue and coalescing their memory regions to make space for storing a new large product. > Do any of the top IDD nodes (e.g. motherlode, I believe UofW, Unisys?) > inject all three channels into the LDM queue? Do they see similar > pbuf_flush statements in ldmd.log? Yes, we have several LDM systems handling all channels of NOAAPORT, including motherlode. motherlode is currently getting pbuf_flush messages occasionally: Nov 05 03:46:42 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.339703 Nov 05 04:00:08 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.547173 Nov 05 04:00:14 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.593650 Nov 05 04:00:22 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 8.302111 Nov 05 04:56:57 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.012492 Nov 05 09:42:22 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.024029 Nov 05 10:35:35 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.325135 Nov 05 10:59:58 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.279881 Nov 05 11:00:01 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.435090 Nov 05 11:00:09 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.105940 Nov 05 13:35:45 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.265940 Nov 05 15:58:37 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.013965 Nov 05 16:35:07 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.171625 Nov 05 16:35:37 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.060463 Nov 05 18:35:07 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.135467 Nov 05 22:38:17 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.158068 Nov 06 04:00:13 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 2.359129 Nov 06 04:00:28 motherlode.ucar.edu pqact[2898]: pbuf_flush 20: time elapsed 7.052843 Nov 06 05:45:12 motherlode.ucar.edu pqact[4596]: pbuf_flush 8: time elapsed 2.150907 Nov 06 07:35:43 motherlode.ucar.edu pqact[5765]: pbuf_flush 4: time elapsed 2.081593 Nov 06 10:35:22 motherlode.ucar.edu pqact[5765]: pbuf_flush 4: time elapsed 2.801245 Nov 06 13:35:31 motherlode.ucar.edu pqact[5765]: pbuf_flush 4: time elapsed 3.379242 > Any insight or comments into these areas and how perhaps how some of the > top tier sites are handling all of the data if appreciated. > > Thanks, > Gregg Grosshans > Storm Prediction Center We're also wondering about the effect of the extra NEXRAD products. The only thing I can suggest is to develop a simulation of the extra load by using something like "pqsend" to send a bunch of extra products to a test LDM to see if that causes load problems. Currently, we're just crossing our fingers. If motherlode has problems handling the increased load, we may have to change its configuration to not feed so many sites or set up a separate LDM on a machine that doesn't do all the decoding and filing ... I'm CC:ing Anne Wilson, in case she has any more insights. --Russ _____________________________________________________________________ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu