This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>From: Gerry Creager N5JXS <address@hidden> >Organization: Texas A&M University -- AATLT >Keywords: 200404271436.i3REaVCT020136 Linux RAID LDM Hi Gerry, >I've been playing with bigbird again. I rebuilt things and tried ext2, >which has helped some. Anne and I have been on bigbird several times in the past few days looking things over. I made a change in the scouring configuration file ~ldm/etc/scour.conf that stopped scour from going through the craft directory more than once. We have also been taking a hard look at the CRAFT processing on your RedHat 9.0 and a Fedora Core 1 (2.4.24-2188 kernel) machine I have in my office (destined for the Caribbean Institute of Hydrology and Meteorology in Husbands, Barbados). Both your and "my" machine experience the same problem when decoding CRAFT data with dcnexr2. For reference, I am running a newer version of dcnexr2 (GEMPAK 5.7.1 distribution) than you are (GEMPAK 5.6 distribution). We are seeing that dcnexr2 invocations can get stuck in a system call when reading from stdin. When this happens, the process shoot up to the top of CPU use, and bring the systems to a crawl. Here is gdb output that shows where dcnexr2s are on my machine when they go south: % gdb decoders/dcnexr2 core.7053 ... (gdb) where #0 0x00bb9c32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00ca5aed in ___newselect_nocancel () from /lib/tls/libc.so.6 #2 0x08049e87 in bufread () #3 0x08049b71 in main () I got the errant dcnexr2s to dump core (on your system and mine) by sending their PIDs an ABRT (-6) signal (kill -6 <pid>). /lib/tls looks to have been introduced into RedHat Linux at version 9.0. I checked our 8.0 machine and it does not exist, but it does on your machine (RH 9) and my machine (Fedora Core 1). When the several dcnexr2s start using lots of CPU, the other dcnexr2s go into a blocked state that they do not come out of even after the errant dcnexr2s are killed. We are trying to understand what is going on since this is a show stopper on newer RH Linuxes at least as far as CRAFT processing is concerned. >HOWEVER, I've learned some interesting things recently about the Promise >and HighPoint controllers. As you may have suspected, they're not >"real" hardware RAID controllers. The driver that runs on Linux, >looking like a SCSI driver, allows the dedicated onboard processor on >either brand card to look at the various RAID options... in software >(or, really, firmware). I found comments to the effect that the problem is not with the Promise card, but, rather, with the Linux 2.4 kernel. The kernel apparently sees the Promise RAID controller simply as an IDE controller. The "solution" that Promise came up with is that kludge of telling the kernel that all the IDE ports are in use (through mods to /boot/grub/grub.conf) and then using a loadable kernel module that treats the disks attached to the controller as a SCSI device. For reference, disks attached to the same Promise controller installed on a FreeBSD system are seen as a single disk -- the exact thing you would expect of a hardware RAID controller. >That's why, with the Promise, you saw much better performance with Linux >software RAID: It's more efficient and doesn't require double-caching. Makes sense to me. >I'm now ordering a 3Ware Escalade 7506-12 to do a single card solution >for the whole box. Theoretically, it's a real RAID controller rather >than a glorified IDE controller. I would research the 3Ware controller's support on Linux 2.4.x kernels to make sure that they are not doing the same thing as Promise. >If that doesn't work, my next solution will be to go to a dedicated ATA >external box, talking to the computer via SCSI. Or move to FreeBSD :-) >I've already engaged >one of these for the NWP cluste I'm installing now. Its performance is >significantly better than Bigbird; this isn't an apples-to-apples >comparison, as the HEAT cluster's RAID system is also SATA, rather than >parallel ATA, so there's already a speed disparity. However, it's also >more robust, near as I can tell. OK. Please keep me informed. The whole RAID issue under Linux has implications beyond our two uses. >Regards, Thanks for the update. Cheers, Tom -- +-----------------------------------------------------------------------------+ * Tom Yoksas UCAR Unidata Program * * (303) 497-8642 (last resort) P.O. Box 3000 * * address@hidden Boulder, CO 80307 * * Unidata WWW Service http://www.unidata.ucar.edu/* +-----------------------------------------------------------------------------+