[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #VWS-137049]: IO trashing 6.7.1 client against 6.9.7 server
- Subject: [LDM #VWS-137049]: IO trashing 6.7.1 client against 6.9.7 server
- Date: Mon, 06 Jun 2011 09:30:28 -0600
Hi Daryl,
> I've been running into an issue with a downstream 6.7.1 client causing my
> server to just start trashing IO and eventually grind the data flow to a
> stop. For example, here is a sysstat output from this morning:
>
> 08:20:01 AM CPU %user %nice %system %iowait %steal
> %idle
> 08:30:02 AM all 4.07 0.00 8.36 7.35 0.00
> 80.22
> 08:40:01 AM all 4.33 0.00 9.33 7.08 0.00
> 79.26
> 08:50:02 AM all 5.49 0.00 13.13 20.69 0.00
> 60.69
> 09:00:02 AM all 6.05 0.00 23.20 52.89 0.00
> 17.86
> 09:10:01 AM all 5.40 0.00 22.39 56.16 0.00
> 16.05
> 09:20:03 AM all 2.39 0.00 19.45 63.24 0.00
> 14.92
> 09:30:03 AM all 1.88 0.00 19.11 64.66 0.00
> 14.34
>
> at 8:50 Z , the train comes off the tracks. This was when a downstream
> 6.7.1 host connected. The system gets behind, but doesn't log anything
> too interesting other than simple things like:
>
> Jun 5 11:16:27 metfs1 pqact[8693] WARN: Processed oldest product in
> queue: 6390.94 s
The message from pqact(1) indicates that its process is way behind: if the
process had sufficient resources, then it would be working on recently-arrived
products and not one that's almost two hours old.
> At this time, IO is pegged. My raid array maxes out around 4,000 TPS. So
> I wake up and try to stop LDM and this logs for all connected hosts.
Is the LDM product-queue in question on a RAID? We've had mixed results doing
that: sometimes it works and sometimes it doesn't. An easy thing to try would
be to move the product-queue to local disk to see if the situation improves.
Can you do that?
> Jun 5 12:00:23 metfs1 cumulus.dmes.fit.edu(feed)[5493] ERROR: fcntl
> F_RDLCK failed for rgn (0 SEEK_SET, 4096) 4: Interrupted system call
>
> I assume this is for some more harsh shutdown of LDM to get it to stop.
I haven't seen this particular error-message, but your analysis seems likely.
> Anyway, I comment out the allow for the downstream 6.7.1 host and start
> ldm back up, no more IO thrashing.
>
> Any ideas about this? Is there some known issue with old ldm clients and
> 6.9 servers?
I'm not aware of any such issue and there shouldn't be any such issue. The LDM
protocol and handling of data-products didn't change between 6.7 and 6.9.
> Perhaps this is why unidata still runs pre-6.9 ldm on most
> of its systems? :)
I'm not in charge of the IDD, so I couldn't say.
> daryl
>
> --
> /**
> * Daryl Herzmann
> * Assistant Scientist -- Iowa Environmental Mesonet
> * http://mesonet.agron.iastate.edu
> */
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: VWS-137049
Department: Support LDM
Priority: Normal
Status: Closed