This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Justin, re: > (Went ahead and dropped the EMC guys off this since this more on the dataflow > side). OK. I removed the same folks from the CCs here. re: > comments are inline... re: measurably smaller volumes from test CONDUIT datastream than from "operational" one I think the issue was a hung process on our ldm2 system. It looks like about 30% of our grib inserts were having some errors (but not enough for the process to totally fail). I didn't see anything wrong with the number of gribinserts we were attempting, there were the same to all three ldm boxes, with ldm2 having a few more due to the parallel NAM. But when I did a 'pqmon' on ldm2 the max age was 1/3 of what it is on ldm0. It looks like we had a hung gribinsert from a few months ago consuming a lot of CPU. Once I killed the process the max age in the queue has been steadily increasing. I'll be watching the log to see if that took care of the failed inserts. OK, this is good news from my perspective. Questions: - to be clear: ldm2 is the system on which the fireweather products are being inserted? - what were the errors being reported by gribinsert? - reference to the parallel NAM is a reference to the fireweather products? re: > We are running the 'ldmadmin addmetrics' action every minute via cron. Excellent! > We are also running rtstats, doesn't that send some data back to you? Yes, but rtstats doesn't tell us the age of the oldest product in the LDM queue, so it does not provide the information needed to judge whether or not the queue is large enough. re: > Doesn't look like we are running gnuplot though, is there an easy way > dump the stats you would be interested in? 'ldmadmin addmetrics' writes to the file ~ldm/logs/metrics.txt. That file has all of the information needed to evaluate the queue size (through age of the oldest product) and system performance (through load average, etc.). It would be useful if you could make that file (those files since the file will get rotated every so often) available to us so we can get a good picture of how things are running. You can do the same on some other machine on which the LDM is installed and on which gnuplot is available. re: merge of GRIB table entries later today > Ok, Thanks! No worries. Given the hiccup you describe above, I will want to continue ingesting the test CONDUIT datastream for the next few days. This will tell us: - if the "hung" gribinsert process was really the cause of lower volumes - if the queue is too small or sufficient for what is being attempted - how well your system is performing In order to do the last item, it would be useful for us to get copies of your ~ldm/logs/metrics.txt* files today and again on Monday morning. Cheers, Tom -- **************************************************************************** Unidata User Support UCAR Unidata Program (303) 497-8642 P.O. Box 3000 address@hidden Boulder, CO 80307 ---------------------------------------------------------------------------- Unidata HomePage http://www.unidata.ucar.edu **************************************************************************** Ticket Details =================== Ticket ID: SBB-325304 Department: Support CONDUIT Priority: Normal Status: Closed