[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 20010123: Strange LDM freezes
- Subject: Re: 20010123: Strange LDM freezes
- Date: Thu, 08 Feb 2001 15:06:20 -0700
Pete Stamus wrote:
>
> Hi Anne...had another freeze this morning (I gave you a call but
> you wisely were out of the office :)
>
> [snip]
> >
> > Try sending both rpc.ldmd and pqing a simple 'kill' command, which will
> > send a SIGTERM, the normal, non-brutal termination signal that will
> > allow processes to die gracefully (if they can, indeed, die). If that
> > doesn't work try 'kill -9'. When you use 'kill -9' on rpc.ldmd you run
> > the risk of corrupting your queue, as the rpc.ldmd may not be able to
> > finish writing a product and die gracefully when it receives that
> > signal.
> >
>
> I tried a plain "kill" on both processes, and that didn't do anything.
> The "kill -9" was the only way to get rid of them. I did let the
> 'ldmadmin check' go until it returned...it didn't say there were any
> problems. It did say that the LDM had not been restarted for the
> 9970 hours (415 days or so), which isn't right. How does it come
> up with that number?? Nothing jumped out of the other numbers:
> 94% idle, load 0.07, 0.06, 0.06. Did an 'ldmadmin queuecheck',
> which returned without comment.
>
> I'm trying to figure out this fifo/named pipe stuff, and if I can
> figure it out will try having pqing read directly from the fifo(s)
> instead of the sockets.
>
> ps
> -------------------------------------------------------------------------
> Pete Stamus | Phone: (303) 415-9701 x224
> Colorado Research Associates (CoRA)* | Fax: (303) 415-9702
> 3380 Mitchell Lane | email: address@hidden
> Boulder, Colorado 80301 USA | *( CoRA is a division of NWRA )
> -------------------------------------------------------------------------
> You can't trust your eyes when your imagination is out of focus.
> -- Mark Twain
> -------------------------------------------------------------------------
Hi Pete,
After talking to a few people here I'm afraid I can't help you that
much. I had thought that there was someone here who understood the SSEC
ingest system well, but that's not the case. But, someone did say that
if you bought the system from SSEC you should be able to get support
from them. Have you tried that?
Also, just for your information, here's what we're running on our ingest
machine that uses the same system:
desi.unidata.ucar.edu.ldm> ps -ef
UID PID PPID C STIME TTY TIME CMD
root 0 0 0 Jan 01 ? 0:07 sched
root 1 0 0 Jan 01 ? 17:16 /etc/init -
root 2 0 0 Jan 01 ? 0:00 pageout
root 3 0 3 Jan 01 ? 1610:48 fsflush
root 23549 23535 0 14:56:19 pts/1 0:00 ps -eaf -ef
root 140 1 0 Jan 01 ? 0:00 /usr/sbin/keyserv
root 327 1 0 Jan 01 ? 0:02 /usr/lib/saf/sac -t 300
root 70 1 0 Jan 01 ? 0:00
/usr/lib/devfsadm/devfseventd
root 138 1 0 Jan 01 ? 0:00 /usr/sbin/rpcbind
root 72 1 0 Jan 01 ? 0:00
/usr/lib/devfsadm/devfsadmd
root 6648 6643 0 Jan 24 pts/0 0:00 csh
root 222 1 0 Jan 01 ? 2:26 /usr/lib/inet/xntpd
root 176 1 0 Jan 01 ? 0:00 /usr/lib/nfs/lockd
root 185 1 0 Jan 01 ? 0:02
/usr/lib/autofs/automountd
root 172 1 0 Jan 01 ? 0:02 /usr/sbin/inetd -s
root 6643 6641 0 Jan 24 pts/0 0:00 -sh
daemon 181 1 0 Jan 01 ? 0:00 /usr/lib/nfs/statd
root 194 1 0 Jan 01 ? 1:50 /usr/sbin/syslogd
root 242 1 0 Jan 01 ? 0:00 /usr/sbin/vold
root 203 1 0 Jan 01 ? 0:04 /usr/sbin/cron
root 217 1 0 Jan 01 ? 0:15 /usr/sbin/nscd
root 260 1 9 Jan 01 ? 5975:19 /opt/nport/bin/inge
root 262 1 0 Jan 01 ? 0:02 /usr/lib/sendmail -q15m
root 244 1 0 Jan 01 ? 0:10 /usr/lib/utmpd
ldm 20513 26095 0 Feb 06 ? 7:08 rpc.ldmd -q
/usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
root 328 1 0 Jan 01 console 0:00 /usr/lib/saf/ttymon -g -h
-p desi.unidata.ucar.edu console login: -T AT386 -d
ldm 27151 26095 1 Feb 03 ? 29:31 rpc.ldmd -q
/usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
root 330 327 0 Jan 01 ? 0:02 /usr/lib/saf/ttymon
root 282 260 0 0:00 <defunct>
root 286 260 0 Jan 01 ? 0:00 /opt/nport/bin/inge
ldm 26096 26095 1 Jan 04 ? 220:21 pqbinstats
root 6641 172 0 Jan 24 ? 0:00 in.rlogind
root 23533 172 0 14:56:13 ? 0:00 in.rlogind
ldm 26098 26095 1 Jan 04 ? 458:27 pqing -f HRS
/tmp/jmb.fifo.2
ldm 26095 1 0 Jan 04 ? 0:01 rpc.ldmd -q
/usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
ldm 26097 26095 0 Jan 04 ? 72:40 pqing -f IDS|DDPLUS
/tmp/jmb.fifo.1
ldm 23535 23533 1 14:56:13 pts/1 0:00 -csh
root 23536 23532 0 14:56:14 ? 0:00 /opt/nport/bin/inge
root 23531 1 0 14:56:03 ? 0:00 /opt/nport/bin/inge
root 23532 23531 0 14:56:08 ? 0:00 /opt/nport/bin/inge
Also, in /etc/init.d there is a script called ingcntl that may be used
in configuring inge.
I still suspect that the problem is that pqing is getting a binary
character when it's expecting text.
Since you're at a .com, I'm assuming you're not a registered
participant, and thus not officially entitled to support. Please let me
know if this is not the case. Usually I try to support people anyway,
but if it gets too time consuming, I have to stop.
The good news is that our system has not experienced the trouble that
you've had, so there must be a way. I hope my small efforts have been
helpful. I will still help with with "small" questions, if a I can.
Good luck on this one.
Anne
--
***************************************************
Anne Wilson UCAR Unidata Program
address@hidden P.O. Box 3000
Boulder, CO 80307
----------------------------------------------------
Unidata WWW server http://www.unidata.ucar.edu/
****************************************************