[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20040614: Quick update on weather3.admin.niu.edu... (cont.)
- Subject: 20040614: Quick update on weather3.admin.niu.edu... (cont.)
- Date: Tue, 15 Jun 2004 09:15:34 -0600
>From: Gilbert Sebenste <address@hidden>
>Organization: NIU
>Keywords: 200406150354.i5F3sOtK011267 LDM Fedora Core Linux
Hi Gilbert,
>Weather3 is back up and running, and feeding weather2 at this time.
>
>At this point, I think I have a good idea of what is happening, as well as
>NOT knowing what is happening.
I just logged onto weather3 and noticed that you are running the
Fedora Core 1 2.4.24-2190.nptlsmp kernel:
uname -a
Linux weather3.admin.niu.edu 2.4.22-1.2190.nptlsmp #1 SMP Wed May 26 13:46:20
EDT 2004 i686 i686 i386 GNU/Linux
and that the modification date on the kernel files in /boot are May 26
(which is in agreement with the uname listing):
ls -alt /boot/vmlinux*.2190*
lrwxrwxrwx 1 root root 44 Jun 10 16:20 /boot/vmlinux-2.4.22-1.2190.nptlsmp ->
../lib/modules/2.4.22-1.2190.nptlsmp/vmlinux*
lrwxrwxrwx 1 root root 41 Jun 10 16:19 /boot/vmlinux-2.4.22-1.2190.nptl ->
../lib/modules/2.4.22-1.2190.nptl/vmlinux*
Since we are intimately involved with multiple machines (at the UPC, in
Costa Rica, and at Texas A&M) running LDM under the Fedora Core 1
2.4.22-1.2188.nptlsmp kernel, and since none of these machines are
experiencing any problems, I have to wonder if your problem is somehow
related to the *.2190.nptlsmp kernel.
For reference, I personally have setup 4 dual processor machines (three
Athlon MP based, one Xeon based) with FC1 *.2188.nptlsmp and LDM queues
of 1 GB or larger (one has a 1 GB queue, one has a 2 GB queue, and two
have 4 GB queues) and have experienced no problems. Two of these
machines are ingesting and processing everyting available in the IDD
including _all_ NEXRAD Level II data and _all_ CONDUIT data. If load
stress could cause buss errors, I should have seen them on these
machines, but I havn't. If large LDM queues could cause buss errors,
I should have see problems on all of these systems.
When did you upgrade to the *.2190.nptlsmp kernel?
Tom
>Whenever I set the LDM queue to 400 MB (by default), it doesn't like it.
>Set it under 300 MB...and it is happy.
>
>This is happening on weather2 and weather3, even though they are
>identical but separate machines. With 1.5 GB of RAM and 250 GB disk
>space...hmmm. Weird. Yet, this is not happening on weather, with 80 GB
>disk space and 2 GB RAM. Weather2 and Weather3 have IDE drives; Weather
>has SCSI with a RAID.
>
>You tell me what's wrong. I dunno. In any case, with the lower queue,
>weather3 seems to be stable. Let me give it one more day to make sure.
>Otherwise, weather2 is humming along fine. Keep feeding from that.
>
>*******************************************************************************
>Gilbert Sebenste ********
>(My opinions only!) ******
>Staff Meteorologist, Northern Illinois University ****
>E-mail: address@hidden ***
>web: http://weather.admin.niu.edu **
>Work phone: 815-753-5492 *
>*******************************************************************************
>
>From: "David B. Bukowski" <address@hidden>
>Date: Mon, 14 Jun 2004 23:45:01 -0500 (CDT)
>To: Gilbert Sebenste <address@hidden>
>cc: address@hidden
>Subject: Re: Quick update on weather3.admin.niu.edu...
>
>well first off treat your RAID as a single drive since I think thats what
>you told me last time we talked. So in otherwords its just another SCSI
>drive. the IDE drives could be where the bottleneck is, since they are
>slower than your SCSI more than likely. Also your IDE is probably running
>from the mainboard instead of a seperate IDE controller card. Since i'm
>not an expert on LDM, i'm just making a wild guess that your getting data
>to your drives faster than they can handle and the pipe to them can't
>handle it anymore and then start timing out. just a wild random guess.,
>back to doing slideshow production now before bed :)
>-dave
>
>-------------------------------------------------------------------------------
>David B. Bukowski |email (work): address@hidden
>Network Analyst III |email (personal): address@hidden
>College of Dupage |webpage: http://www.cshschess.org/davebb/
>Glen Ellyn, Illinois |pager: (708) 241-7655
>http://www.cod.edu/ |work phone: (630) 942-2591
>-------------------------------------------------------------------------------
>
>From: Gerry Creager N5JXS <address@hidden>
>Date: Tue, 15 Jun 2004 05:38:49 -0500
>Organization: Texas A&M University -- AATLT
>
>First thought... and before coffee, too... is that you're writing the
>queue to the same disk as your data. I've config'd all my machines to
>have a system partition (60 GB on up, depending on prices) and a data
>partition for LDM and gempak data. I write the queue to system space
>and the data and products to the data partition.
>
>Gerry
>
>--
>Gerry Creager -- address@hidden
>Texas Mesonet -- AATLT, Texas A&M University
>Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
>Page: 979.228.0173
>Office: 903A Eller Bldg, TAMU, College Station, TX 77843
>
--
+-----------------------------------------------------------------------------+
* Tom Yoksas UCAR Unidata Program *
* (303) 497-8642 (last resort) P.O. Box 3000 *
* address@hidden Boulder, CO 80307 *
* Unidata WWW Service http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+