[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20020724: LDM / CPU Issue
- Subject: 20020724: LDM / CPU Issue
- Date: Wed, 24 Jul 2002 14:51:27 -0600
Patrick,
It looks like most of the resources are being taken up by decoders
(as you see with the high IOWAIT).
When you first start your LDM, you will have a backlog of data to digest
(generally 1 hours worth, unless you have modified the values).
This usually means the LDM is hitting the disk pretty hard filing and
kicking off decoders.
If this CPU usage is temporary, eg goes down after your LDM has
caught up (use ldmadmin watch to see when your arriving
productshave cught up), then it probably can't be avoided.
If this remains the condition, then you may need to look at the
2 biggest CPU users which are the McIDAS programs. Could
they be looking for something that you have to recreate after your
file system was fixed? Are you seeing your McIDAS surface data?
That seems to be the decoders of top usage.
I'll pass on to Tom Yoksas to see if he has input on
your xcd decoders.
Steve Chiswell
Unidata User Support
>From: "Patrick O'Reilly" <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200207242009.g6OK9o901665
>This is a multi-part message in MIME format.
>
>------=_NextPart_000_00A2_01C23324.274AFBE0
>Content-Type: text/plain;
> charset="iso-8859-1"
>Content-Transfer-Encoding: quoted-printable
>
>Hi there,
>
>I had my ldm machine go down, and brought it back up after fixing the =
>root filesystem. This happened while I was away (of course). After =
>getting things up, I noticed the ldm making almost 100% of the use of =
>the CPU. Below, the results from top with the ldm running:
>
>load averages: 0.88, 1.24, 1.00 =
> 15:01:30
>62 processes: 61 sleeping, 1 on cpu
>CPU states: 0.0% idle, 27.3% user, 11.4% kernel, 61.3% iowait, 0.0% =
>swap
>Memory: 256M real, 86M free, 211M swap in use, 487M swap free
>
> PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
> 732 ldm 1 60 0 5824K 4704K sleep 3:30 28.02% dmsyn.k
> 730 ldm 1 60 0 3616K 2416K sleep 1:20 12.16% dmsfc.k
> 756 ldm 1 58 0 25M 5160K sleep 0:16 2.07% dcgrib2
> 699 ldm 1 59 0 245M 33M sleep 0:40 1.25% pqact
> 1267 ldm 1 58 0 2256K 1696K cpu 0:00 0.36% top
> 228 root 8 55 0 2992K 2256K sleep 0:00 0.23% nscd
> 728 ldm 1 59 0 2616K 1440K sleep 0:09 0.22% ingetext.k
> 701 ldm 1 59 0 244M 33M sleep 0:05 0.22% rpc.ldmd
> 1254 root 1 58 0 2840K 1856K sleep 0:00 0.21% sshd
> 765 ldm 1 59 0 2480K 1280K sleep 0:00 0.20% ingebin.k
> 698 ldm 1 59 0 244M 21M sleep 0:08 0.17% pqbinstats
> 1258 ldm 1 52 0 1448K 1256K sleep 0:00 0.12% csh
> 1256 ldm 1 52 2 1864K 1416K sleep 0:00 0.07% traceroute
> 1251 ldm 1 52 2 2696K 2328K sleep 0:00 0.06% netcheck
> 745 ldm 1 59 0 21M 2176K sleep 0:22 0.02% dcmetr
>
>And now with the ldm stopped:
>
>load averages: 0.04, 0.36, 0.65 =
> 15:08:44
>36 processes: 35 sleeping, 1 on cpu
>CPU states: 99.2% idle, 0.2% user, 0.6% kernel, 0.0% iowait, 0.0% =
>swap
>Memory: 256M real, 112M free, 28M swap in use, 670M swap free
>
> PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
> 1321 ldm 1 50 0 2232K 1672K cpu 0:00 0.40% top
> 1254 root 1 58 0 2840K 1856K sleep 0:00 0.04% sshd
> 363 root 12 58 0 3176K 2904K sleep 0:00 0.02% mibiisa
> 440 root 1 38 0 6592K 4888K sleep 0:01 0.00% Xvfb
> 177 root 1 55 0 2808K 1472K sleep 0:01 0.00% sshd
> 151 root 1 13 0 1808K 1176K sleep 0:00 0.00% inetd
> 265 root 1 31 0 968K 792K sleep 0:00 0.00% htt
> 185 daemon 4 33 0 2640K 1928K sleep 0:00 0.00% statd
> 186 root 1 33 0 2064K 1336K sleep 0:00 0.00% lockd
> 1208 root 1 39 0 2448K 1584K sleep 0:00 0.00% fbconsole
> 53 root 5 40 0 1344K 808K sleep 0:00 0.00% =
>syseventconfd
> 281 root 1 42 0 320K 320K sleep 0:00 0.00% rc3
437 root 1 42 0 320K 312K sleep 0:00 0.00% sh
> 259 root 1 46 0 3168K 1384K sleep 0:00 0.00% sendmail
> 288 root 4 48 0 5176K 2520K sleep 0:00 0.00% dtlogin
>
>Any ideas about why it would be hogging the resources as such?
>
>Thanks ....
>
>Patrick
>
>_______________________________________
>Patrick O'Reilly =20
>Meteorological Decision Support Scientist
>The STORM Project - University of Northern Iowa
>address@hidden ~ ph: 319-273-3789
>
>
>
>_______________________________________
>Patrick O'Reilly =20
>Meteorological Decision Support Scientist
>The STORM Project - University of Northern Iowa
>address@hidden ~ ph: 319-273-3789
>
>------=_NextPart_000_00A2_01C23324.274AFBE0
>Content-Type: text/html;
> charset="iso-8859-1"
>Content-Transfer-Encoding: quoted-printable
>
><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
><HTML><HEAD>
><META http-equiv=3DContent-Type content=3D"text/html; =
>charset=3Diso-8859-1">
><META content=3D"MSHTML 6.00.2600.0" name=3DGENERATOR>
><STYLE></STYLE>
></HEAD>
><BODY bgColor=3D#efefef>
><DIV><FONT face=3DVerdana size=3D2>Hi there,</FONT></DIV>
><DIV><FONT face=3DVerdana size=3D2></FONT> </DIV>
><DIV><FONT face=3DVerdana size=3D2>I had my ldm machine go down, and =
>brought it back=20
>up after fixing the root filesystem. This happened while I was =
>away (of=20
>course). After getting things up, I noticed the ldm making almost =
>100% of=20
>the use of the CPU. Below, the results from top with the ldm=20
>running:</FONT></DIV>
><DIV><FONT face=3DVerdana size=3D2></FONT> </DIV>
><DIV><FONT face=3DVerdana size=3D2>load averages: 0.88, =
>1.24, =20
>1.00 &nb=
>sp; &nbs=
>p;  =
>; =
> =20
>15:01:30<BR>62 processes: 61 sleeping, 1 on cpu<BR>CPU =
>states: 0.0%=20
>idle, 27.3% user, 11.4% kernel, 61.3% iowait, 0.0% swap<BR>Memory: =
>256M=20
>real, 86M free, 211M swap in use, 487M swap free</FONT></DIV>
><DIV> </DIV>
><DIV><FONT face=3DVerdana size=3D2> PID USERNAME THR PRI =
>NICE =20
>SIZE RES STATE TIME CPU=20
>COMMAND<BR> 732 =
>ldm =20
>1 60 0 5824K 4704K sleep 3:30 =
>28.02%=20
>dmsyn.k<BR> 730 =
>ldm =20
>1 60 0 3616K 2416K sleep 1:20 =
>12.16%=20
>dmsfc.k<BR> 756 =
>ldm =20
>1 58 0 25M 5160K =
>sleep =20
>0:16 2.07% dcgrib2<BR> 699=20
>ldm 1 =
>59 =20
>0 245M 33M sleep 0:40 1.25%=20
>pqact<BR> 1267 ldm =
>1 =20
>58 0 2256K 1696K cpu =
>0:00 =20
>0.36% top<BR> 228 root =
>8 =20
>55 0 2992K 2256K sleep 0:00 =
>0.23%=20
>nscd<BR> 728 ldm =
>1 =20
>59 0 2616K 1440K sleep 0:09 =
>0.22%=20
>ingetext.k<BR> 701 =
>ldm =20
>1 59 0 244M 33M=20
>sleep 0:05 0.22% rpc.ldmd<BR> 1254=20
>root 1 58 0 =
>2840K=20
>1856K sleep 0:00 0.21% sshd<BR> 765=20
>ldm 1 =
>59 0=20
>2480K 1280K sleep 0:00 0.20% =
>ingebin.k<BR> =20
>698 ldm 1 =
>59 =20
>0 244M 21M sleep 0:08 0.17%=20
>pqbinstats<BR> 1258 ldm =
>1 =20
>52 0 1448K 1256K sleep 0:00 =
>0.12%=20
>csh<BR> 1256 ldm 1 =
>
>52 2 1864K 1416K sleep 0:00 =
>0.07%=20
>traceroute<BR> 1251 ldm =
>1 =20
>52 2 2696K 2328K sleep 0:00 =
>0.06%=20
>netcheck<BR> 745 =
>ldm =20
>1 59 0 21M 2176K =
>sleep =20
>0:22 0.02% dcmetr</FONT></DIV>
><DIV><FONT face=3DVerdana size=3D2></FONT> </DIV>
><DIV><FONT face=3DVerdana size=3D2>And now with the ldm =
>stopped:</FONT></DIV>
><DIV><FONT face=3DVerdana size=3D2></FONT> </DIV>
><DIV><FONT face=3DVerdana size=3D2>load averages: 0.04, =
>0.36, =20
>0.65 &nb=
>sp; &nbs=
>p;  =
>; =
> =20
>15:08:44<BR>36 processes: 35 sleeping, 1 on cpu<BR>CPU states: =
>99.2%=20
>idle, 0.2% user, 0.6% kernel, 0.0% iowait, 0.0%=20
>swap<BR>Memory: 256M real, 112M free, 28M swap in use, 670M swap=20
>free</FONT></DIV>
><DIV> </DIV>
><DIV><FONT face=3DVerdana size=3D2> PID USERNAME THR PRI =
>NICE =20
>SIZE RES STATE TIME CPU=20
>COMMAND<BR> 1321 ldm =
>1 =20
>50 0 2232K 1672K cpu =
>0:00 =20
>0.40% top<BR> 1254 root =
>1 =20
>58 0 2840K 1856K sleep 0:00 =
>0.04%=20
>sshd<BR> 363 root 12 =20
>58 0 3176K 2904K sleep 0:00 =
>0.02%=20
>mibiisa<BR> 440 root =
>1 =20
>38 0 6592K 4888K sleep 0:01 =
>0.00%=20
>Xvfb<BR> 177 root =
>1 =20
>55 0 2808K 1472K sleep 0:01 =
>0.00%=20
>sshd<BR> 151 root =
>1 =20
>13 0 1808K 1176K sleep 0:00 =
>0.00%=20
>inetd<BR> 265 root =
>1 =20
>31 0 968K 792K sleep =
>0:00 =20
>0.00% htt<BR> 185 daemon 4 =20
>33 0 2640K 1928K sleep 0:00 =
>0.00%=20
>statd<BR> 186 root =
>1 =20
>33 0 2064K 1336K sleep 0:00 =
>0.00%=20
>lockd<BR> 1208 root 1 =20
>39 0 2448K 1584K sleep 0:00 =
>0.00%=20
>fbconsole<BR> 53 =
>root =20
>5 40 0 1344K 808K sleep =20
>0:00 0.00% syseventconfd<BR> 281=20
>root 1 42 =
>0 =20
>320K 320K sleep 0:00 0.00% =
>rc3<BR> 437=20
>root 1 42 =
>0 =20
>320K 312K sleep 0:00 0.00% =
>sh<BR> 259=20
>root 1 46 0 =
>3168K=20
>1384K sleep 0:00 0.00% sendmail<BR> =
>288=20
>root 4 48 0 =
>5176K=20
>2520K sleep 0:00 0.00% dtlogin<BR></FONT></DIV>
><DIV><FONT face=3DVerdana size=3D2>Any ideas about why it would be =
>hogging the=20
>resources as such?</FONT></DIV>
><DIV><FONT face=3DVerdana size=3D2></FONT> </DIV>
><DIV><FONT face=3DVerdana size=3D2>Thanks ....</FONT></DIV>
><DIV><FONT face=3DVerdana size=3D2></FONT> </DIV>
><DIV><FONT face=3DVerdana size=3D2>Patrick</FONT></DIV>
><DIV><FONT face=3DVerdana size=3D2></FONT> </DIV>
><DIV><FONT face=3DVerdana=20
>size=3D2>_______________________________________<BR>Patrick=20
>O'Reilly  =
>; =
> =20
><BR>Meteorological Decision Support Scientist<BR>The STORM Project - =
>University=20
>of Northern Iowa<BR><A=20
>href=3D"mailto:address@hidden">address@hidden</A> =
> ~ =20
>ph: 319-273-3789</DIV></FONT>
><DIV><FONT face=3DVerdana size=3D2> </DIV>
><DIV><BR></DIV></FONT>
><DIV><FONT face=3DVerdana=20
>size=3D2>_______________________________________<BR>Patrick=20
>O'Reilly  =
>; =
> =20
><BR>Meteorological Decision Support Scientist<BR>The STORM Project - =
>University=20
>of Northern Iowa<BR><A=20
>href=3D"mailto:address@hidden">address@hidden</A> =
> ~ =20
>ph: 319-273-3789</FONT></DIV></BODY></HTML>
>
>------=_NextPart_000_00A2_01C23324.274AFBE0--
>