[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Support #VOV-751174]: Re: [ldm-users] Problems getting data from idd.aos.wisc.edu
- Subject: [Support #VOV-751174]: Re: [ldm-users] Problems getting data from idd.aos.wisc.edu
- Date: Fri, 15 Feb 2013 14:57:11 -0700
Pete,
> Looks like updating to CentOS 6.3 and recompiling had no effect. I just
> restarted using 6.3 and again, the CPU usage on individual ldmd
> processes is very high (50-90%) and data is moving at a crawl.
>
> I do notice that in 6.11.3 the CPU utilization is almost entirely
> consumed by system, rather than user context.
That is very odd. We don't see that here at all: our LDM 6.11.3 server handles
about 88 downstream connections with a load average around 1 to 2.
> Here's what a 'top' looks like on 6.11.2 vs 6.11.3
>
> 2/15/2013
> 6.11.3
>
> top - 14:51:08 up 20 min, 2 users, load average: 21.46, 17.09, 11.79
> Tasks: 575 total, 17 running, 558 sleeping, 0 stopped, 0 zombie
> Cpu0 : 1.6%us, 95.1%sy, 0.0%ni, 3.3%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu1 : 1.7%us, 88.3%sy, 0.0%ni, 7.0%id, 0.0%wa, 0.0%hi, 3.0%si,
> 0.0%st
> Cpu2 : 1.7%us, 97.7%sy, 0.0%ni, 0.7%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu3 : 2.0%us, 96.7%sy, 0.0%ni, 1.3%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu4 : 1.7%us, 95.4%sy, 0.0%ni, 2.6%id, 0.0%wa, 0.0%hi, 0.3%si,
> 0.0%st
> Cpu5 : 1.7%us, 97.7%sy, 0.0%ni, 0.7%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu6 : 1.7%us, 95.3%sy, 0.0%ni, 3.0%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu7 : 1.7%us, 93.4%sy, 0.0%ni, 5.0%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu8 : 1.7%us, 97.0%sy, 0.0%ni, 1.3%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu9 : 1.7%us, 98.0%sy, 0.0%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu10 : 1.7%us, 97.7%sy, 0.0%ni, 0.7%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu11 : 1.0%us, 98.0%sy, 0.0%ni, 1.0%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu12 : 1.3%us, 94.1%sy, 0.0%ni, 4.6%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu13 : 1.0%us, 92.7%sy, 0.0%ni, 6.3%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu14 : 1.0%us, 98.4%sy, 0.0%ni, 0.7%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu15 : 1.3%us, 98.3%sy, 0.0%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 32874844k total, 19404024k used, 13470820k free, 39280k buffers
> Swap: 32767984k total, 0k used, 32767984k free, 17950924k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 28750 ldm 20 0 23.0g 872m 871m R 81.6 2.7 0:59.36 ldmd
> 28752 ldm 20 0 23.0g 927m 926m R 80.3 2.9 1:04.70 ldmd
> 28810 ldm 20 0 23.0g 154m 152m R 79.6 0.5 0:05.12 ldmd
> 28749 ldm 20 0 23.0g 847m 846m R 78.7 2.6 1:03.10 ldmd
> 28782 ldm 20 0 23.0g 443m 441m R 73.7 1.4 0:26.03 ldmd
> 28746 ldm 20 0 23.0g 865m 864m R 72.0 2.7 1:00.12 ldmd
> 28754 ldm 20 0 23.0g 867m 866m R 72.0 2.7 0:59.15 ldmd
> 28808 ldm 20 0 23.0g 187m 185m R 69.7 0.6 0:07.03 ldmd
> 28753 ldm 20 0 23.0g 884m 882m R 69.4 2.8 0:59.83 ldmd
> 28807 ldm 20 0 23.0g 170m 169m R 68.4 0.5 0:07.60 ldmd
> 28813 ldm 20 0 23.0g 86m 84m R 54.5 0.3 0:01.65 ldmd
> 28812 ldm 20 0 23.0g 81m 80m R 52.9 0.3 0:01.60 ldmd
> 28814 ldm 20 0 23.0g 75m 73m R 38.3 0.2 0:01.16 ldmd
> 28282 ldm 20 0 23.0g 2876 1720 S 35.7 0.0 0:56.41 ldmd
> 28281 ldm 20 0 23.0g 3388 2224 S 35.0 0.0 0:59.65 ldmd
> 28294 ldm 20 0 23.0g 14m 12m S 35.0 0.0 1:00.19 ldmd
> 28302 ldm 20 0 23.0g 3592 2404 S 35.0 0.0 0:59.76 ldmd
> 28278 ldm 20 0 23.0g 20m 18m S 34.7 0.1 1:00.39 ldmd
> 28300 ldm 20 0 23.0g 30m 28m S 34.7 0.1 1:01.31 ldmd
> 28301 ldm 20 0 23.0g 3720 2528 S 34.4 0.0 0:59.61 ldmd
> 28280 ldm 20 0 23.0g 32m 31m S 34.0 0.1 1:00.48 ldmd
> 28296 ldm 20 0 23.0g 13m 12m S 34.0 0.0 0:58.95 ldmd
> 28279 ldm 20 0 23.0g 3932 2776 S 33.7 0.0 1:00.65 ldmd
> 28288 ldm 20 0 23.0g 13m 12m S 33.7 0.0 0:59.04 ldmd
> 28290 ldm 20 0 23.0g 14m 12m S 33.7 0.0 0:59.94 ldmd
Very high system loads, indeed.
>
>
> 6.11.2
> top - 15:14:45 up 44 min, 3 users, load average: 1.21, 2.22, 4.88
> Tasks: 580 total, 1 running, 579 sleeping, 0 stopped, 0 zombie
> Cpu0 : 0.7%us, 15.0%sy, 0.0%ni, 84.4%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu1 : 1.0%us, 15.3%sy, 0.0%ni, 83.1%id, 0.0%wa, 0.0%hi, 0.7%si,
> 0.0%st
> Cpu2 : 1.0%us, 14.8%sy, 0.0%ni, 83.8%id, 0.0%wa, 0.0%hi, 0.3%si,
> 0.0%st
> Cpu3 : 1.3%us, 16.0%sy, 0.0%ni, 82.3%id, 0.0%wa, 0.0%hi, 0.3%si,
> 0.0%st
> Cpu4 : 1.0%us, 15.3%sy, 0.0%ni, 83.1%id, 0.0%wa, 0.0%hi, 0.7%si,
> 0.0%st
> Cpu5 : 1.0%us, 16.8%sy, 0.0%ni, 81.8%id, 0.0%wa, 0.0%hi, 0.3%si,
> 0.0%st
> Cpu6 : 1.0%us, 16.6%sy, 0.0%ni, 82.5%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu7 : 0.3%us, 15.7%sy, 0.0%ni, 83.7%id, 0.0%wa, 0.0%hi, 0.3%si,
> 0.0%st
> Cpu8 : 0.3%us, 15.6%sy, 0.0%ni, 84.1%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu9 : 0.7%us, 15.2%sy, 0.0%ni, 84.1%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu10 : 0.7%us, 15.2%sy, 0.0%ni, 84.1%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu11 : 0.7%us, 15.3%sy, 0.0%ni, 84.1%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu12 : 0.7%us, 15.2%sy, 0.0%ni, 84.1%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu13 : 0.7%us, 15.2%sy, 0.0%ni, 83.8%id, 0.0%wa, 0.0%hi, 0.3%si,
> 0.0%st
> Cpu14 : 0.7%us, 15.3%sy, 0.0%ni, 84.1%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Cpu15 : 1.0%us, 14.6%sy, 0.0%ni, 84.4%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 32874844k total, 24002200k used, 8872644k free, 34976k buffers
> Swap: 32767984k total, 0k used, 32767984k free, 20543600k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 38385 ldm 20 0 23.0g 552m 551m S 2.3 1.7 0:52.59 ldmd
> 38405 ldm 20 0 23.0g 642m 640m S 2.3 2.0 0:52.01 ldmd
> 38424 ldm 20 0 23.0g 558m 556m S 2.3 1.7 0:54.13 ldmd
> 38433 ldm 20 0 23.0g 549m 547m S 2.3 1.7 0:53.95 ldmd
> 38497 ldm 20 0 23.0g 526m 525m S 2.3 1.6 0:48.50 ldmd
> 38507 ldm 20 0 23.0g 1.7g 1.7g S 2.3 5.5 2:31.76 ldmd
> 38510 ldm 20 0 23.0g 2.1g 2.1g S 2.3 6.7 2:54.38 ldmd
> 38872 ldm 20 0 23.0g 168m 167m S 2.3 0.5 0:11.16 ldmd
> 39034 ldm 20 0 23.0g 176m 174m S 2.3 0.5 0:03.17 ldmd
> 38386 ldm 20 0 23.0g 1.2g 1.2g S 2.0 3.8 1:44.74 ldmd
> 38387 ldm 20 0 23.0g 572m 570m S 2.0 1.8 0:53.63 ldmd
> 38388 ldm 20 0 23.0g 1.2g 1.2g S 2.0 3.8 1:45.35 ldmd
> 38391 ldm 20 0 23.0g 644m 642m S 2.0 2.0 0:52.13 ldmd
> 38394 ldm 20 0 23.0g 579m 578m S 2.0 1.8 0:52.05 ldmd
> 38400 ldm 20 0 23.0g 1.1g 1.1g S 2.0 3.6 1:28.46 ldmd
> 38401 ldm 20 0 23.0g 2.1g 2.1g S 2.0 6.7 2:57.24 ldmd
> 38403 ldm 20 0 23.0g 1.2g 1.2g S 2.0 3.8 0:52.49 ldmd
> 38404 ldm 20 0 23.0g 526m 524m S 2.0 1.6 0:52.45 ldmd
> 38408 ldm 20 0 23.0g 539m 537m S 2.0 1.7 0:51.08 ldmd
> 38414 ldm 20 0 23.0g 632m 630m S 2.0 2.0 0:48.43 ldmd
> 38415 ldm 20 0 23.0g 558m 556m S 2.0 1.7 0:53.70 ldmd
> 38419 ldm 20 0 23.0g 526m 525m S 2.0 1.6 0:51.41 ldmd
> 38422 ldm 20 0 23.0g 563m 561m S 2.0 1.8 0:52.63 ldmd
> 38423 ldm 20 0 23.0g 558m 556m S 2.0 1.7 0:53.92 ldmd
> 38426 ldm 20 0 23.0g 563m 561m S 2.0 1.8 0:52.66 ldmd
I can't imagine what could be causing LDM 6.11.3 to have much higher system
loads than LDM 6.11.2 on your system.
> My queue size is 24 Gb, my system RAM is 32 Gb, could that have anything
> to do with it?
That seems close. Your product-queue should fit in memory. We like to have
about twice as much memory as the size of the product-queue but have used less.
Our backend server has about 74 GB of memory and a 30 GB queue.
> Steve, I can get you ldm access to idd.aos.wisc.edu if you want to poke
> around at all.
That would help.
> I also can send ldmd.log when running 6.11.3 vs 6.11.2.
If I get on the system, then I can look at the log file directly. Would you
mind if I switched between the LDM-s (or is the system operational)?
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: VOV-751174
Department: Support LDM
Priority: Normal
Status: Closed