[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20040526: bigbird status (cont.)
- Subject: 20040526: bigbird status (cont.)
- Date: Wed, 26 May 2004 18:02:45 -0600
>From: Gerry Creager N5JXS <address@hidden>
>Organization: Texas A&M University -- AATLT
>Keywords: 200405262241.i4QMfhtK005798 LDM RAID JFS
Hi Gerry,
>IM==Instant Messaging... Sometimes convenient for on-line communications
>while remote troubleshooting...
I should have known...
>I've seen the RAID failure on reboot several times.
Interesting... Did you run fsck (or variant) to get things patched
up before remounting the RAID filesystem?
>I really want to
>get rid of this card and get into a 3Ware card. New 'Net find today
>suggests that, as suspected, Promise's proprietary RAID is less than
>advertised. They said nicer things 'bout HighPoint, but the only "real
>RAID" comments were reserved for Adapeptec and 3Ware... noting Adaptec
>followed 3Ware's lead.
I believe that Pete Pokrandt of U Wisc/AOS is using a 3Ware card in
his Linux PC.
>effectively, when rebooting, the system times out while flushing the
>product queue now.
Product queue? If you mean LDM product queue, that is on a different
file system. Also, I did not see a startup script for the LDM
in /etc/init.d, so I added one:
/etc/init.d/ldmd
Since this wasn't there on reboot, the LDM queue would not have
been checked by it.
>That apparently is tied to the RAID corruption in
>some manner. If we really saw a RAID corruption while running today,
>that's a first for me on this system. Further, there's spares. It
>should have alarmed and fixed itself.
I agree, but the load average did go up to 400...
>I'll keep looking. By doing the reboot, we did salvage the messages
>logs, and there might be some clues.
OK.
>Thanks for spotting the problem. I was working on bigfoot and didn't
>even look at bigbird today, save to place it on a KVM. Hmmm. It's
>possible that caused a hiccup, but it shouldn't have. It's been in idle
>state WRT the monitor, keyboard, mouse for weeks. The reboot for the
>box is serendipitous. I wasn't planning to reboot 'til needed anyway,
>so I'd not have had console access (at least for X) 'til I did.
>Keyboard and video worked as expected...
OK.
>Later, gerry
fsck.jfs is still running on /dev/md0, and it will
take time to finish. I will try to look in on bigbird later tonight
or early tomorrow morning. As soon as fsck.jfs finishes, I will try
to mount /data and crank up the LDM.
Time to head home...
Tom
From: Tom Yoksas <address@hidden>
To: Gerry Creager N5JXS <address@hidden>
>From: Gerry Creager N5JXS <address@hidden>
>Organization: Texas A&M University -- AATLT
>Keywords: 200405262241.i4QMfhtK005798 LDM RAID JFS
The RAID was not mounted on bootup. Is there supposed to be an entry
in /etc/fstab to mount it? I couldn't find the entry I thought would
be there.
We tried mounting /dev/md0 as /data, but got a bad super block
message. We are now running /sbin/fsck.jfs to check for problems.
The RAID having a failure and then not being available would fit
as a cause for the load average to ramp up to 400: all processes
would be waiting to write to a resource that no longer existed.
More later as we discover stuff...
Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas UCAR Unidata Program *
* (303) 497-8642 (last resort) P.O. Box 3000 *
* address@hidden Boulder, CO 80307 *
* Unidata WWW Service http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+