This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Daryl, > I've been running into an issue with a downstream 6.7.1 client causing my > server to just start trashing IO and eventually grind the data flow to a > stop. For example, here is a sysstat output from this morning: > > 08:20:01 AM CPU %user %nice %system %iowait %steal > %idle > 08:30:02 AM all 4.07 0.00 8.36 7.35 0.00 > 80.22 > 08:40:01 AM all 4.33 0.00 9.33 7.08 0.00 > 79.26 > 08:50:02 AM all 5.49 0.00 13.13 20.69 0.00 > 60.69 > 09:00:02 AM all 6.05 0.00 23.20 52.89 0.00 > 17.86 > 09:10:01 AM all 5.40 0.00 22.39 56.16 0.00 > 16.05 > 09:20:03 AM all 2.39 0.00 19.45 63.24 0.00 > 14.92 > 09:30:03 AM all 1.88 0.00 19.11 64.66 0.00 > 14.34 > > at 8:50 Z , the train comes off the tracks. This was when a downstream > 6.7.1 host connected. The system gets behind, but doesn't log anything > too interesting other than simple things like: > > Jun 5 11:16:27 metfs1 pqact[8693] WARN: Processed oldest product in > queue: 6390.94 s The message from pqact(1) indicates that its process is way behind: if the process had sufficient resources, then it would be working on recently-arrived products and not one that's almost two hours old. > At this time, IO is pegged. My raid array maxes out around 4,000 TPS. So > I wake up and try to stop LDM and this logs for all connected hosts. Is the LDM product-queue in question on a RAID? We've had mixed results doing that: sometimes it works and sometimes it doesn't. An easy thing to try would be to move the product-queue to local disk to see if the situation improves. Can you do that? > Jun 5 12:00:23 metfs1 cumulus.dmes.fit.edu(feed)[5493] ERROR: fcntl > F_RDLCK failed for rgn (0 SEEK_SET, 4096) 4: Interrupted system call > > I assume this is for some more harsh shutdown of LDM to get it to stop. I haven't seen this particular error-message, but your analysis seems likely. > Anyway, I comment out the allow for the downstream 6.7.1 host and start > ldm back up, no more IO thrashing. > > Any ideas about this? Is there some known issue with old ldm clients and > 6.9 servers? I'm not aware of any such issue and there shouldn't be any such issue. The LDM protocol and handling of data-products didn't change between 6.7 and 6.9. > Perhaps this is why unidata still runs pre-6.9 ldm on most > of its systems? :) I'm not in charge of the IDD, so I couldn't say. > daryl > > -- > /** > * Daryl Herzmann > * Assistant Scientist -- Iowa Environmental Mesonet > * http://mesonet.agron.iastate.edu > */ Regards, Steve Emmerson Ticket Details =================== Ticket ID: VWS-137049 Department: Support LDM Priority: Normal Status: Closed