[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: troubles stopping ldm with ldmadmin on linux (fwd)
- Subject: Re: troubles stopping ldm with ldmadmin on linux (fwd)
- Date: Thu, 18 May 2000 10:05:19 -0600 (MDT)
===============================================================================
Robb Kambic Unidata Program Center
Software Engineer III Univ. Corp for Atmospheric Research
address@hidden WWW: http://www.unidata.ucar.edu/
===============================================================================
---------- Forwarded message ----------
Date: Thu, 18 May 2000 09:28:02 -0400
From: James D. Marco <address@hidden>
To: Jim Koermer <address@hidden>
Subject: Re: troubles stopping ldm with ldmadmin on linux
Hi All,
Yes, I agree and it is not restricted to LDM. This is correct
behavior, from a computer science standpoint. Several data-logger
daemons I wrote do the same thing on HP, Sun, SGI, and Linux. These
processes maintain open files over a long time monitoring/collecting
'bursty' data, but, are not continuously used; they receive data spaced
over a 'long time' in terms of CPU utilization...more than one second. I
always assumed this was caused by a combination of:
Large files (as you mention)
The process-owned buffered file IO (streams)
The operating system disk buffering
The OS swap: swaps a process to disk if not memory locked
The hardware caching - CPU cache, Disk cache, Controller cache.
The entire network of dependencies for all this is quite large.
One other item that occurs to me, the 'Garbage Collection' mechanism's
in modern OS's. Large processes and process buffer utilization will
leave large holes in real memory when killed. Assuming a threshold value
for memory fragmentation, the OS will probably initiate a cleanup...
which can be expensive in terms of CPU time.
Usually, all this happens within 10-15 seconds, but 60-120 seconds is
probably not unusual for a well tuned system.
If delays are much longer, I would look elsewhere. My first guess is
that the hard drive is badly fragmented, overloaded, or in need of
reformatting (low level & file system.) File system organization can be
a problem. Locate swap space and large LDM Queues/Decoder outputs on
separate drives, not just partitions on the same physical drive. Increase
the amount of RAM. More....
jdm
At 09:47 PM 5/17/00 -0400, you wrote:
>Doug,
>
>Unless this is something unique to linux, I'm not sure that this
>behavior is all that unusual. I've noticed it for quite some time on
>FreeBSD and AIX systems running LDM. Usually this occurs during the
>ingestion of a large McIDAS area file that may take some time to
>download completely. I assume that this could also happen with some of
>the larger grib files. You can check this by looking at the file sizes
>after doing the "ldmadmin stop". After the file in question completely
>downloads, the associated rpc.ldmd process will end. I've noticed that
>if a large (~25MB) file just started downloading after the stop, it can
>take several minutes for it to complete.
>
>--
>James P. Koermer E-Mail: address@hidden
>Professor of Meteorology Office Phone: (603)535-2574
>Natural Science Department Office Fax: (603)535-2723
>Plymouth State College WWW: http://vortex.plymouth.edu/
>Plymouth, NH 03264
>
>
>Doug Hunt wrote:
>>
>> Hi all: I have recently been having troubles stopping LDM via 'ldmadmin
>> stop' on linux. The ldmadmin script seems to not check correctly if all
>> LDM kids are killed off. The result is that after an 'ldmadmin stop',
>> one must wait for a minute or so for all rpc.ldmd children to die. If
>> one tries 'ldmadmin start' during this time, it hangs...
>>
>> I have made a small patch to 'ldmadmin' which seems to clean up this
>> problem. Instead of just killing off the rpc.ldmd process group leader,
>> it kills off all the kids too.
>>
>> Attached is the new ldmadmin script.
>>
>> Regards,
>>
>> Doug Hunt
>>
>> --
>> address@hidden
>> Software Engineer III
>> UCAR - COSMIC
>> Tel. (303) 497-2611
>>
>
James D. Marco, address@hidden, address@hidden
Programmer/Analyst, System/Network Administration,
Computer Support, Et Al.
Office: 1020 Bradfield Hall, Cornell University
Home: 302 Mary Lane, Varna (607)273-9132
Computer Lab: 1125 Bradfield (607)255-5589