This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>From: Gerry Creager n5jxs <address@hidden> >Organization: AATLT, Texas A&M University >Keywords: 200410010304.i9134kUE020346 LDM bigbird hardware Hi Gerry, >Your description of the scenario is consistent in timing, but I was >seeing from the logs that a number of processes had exited abnormally, >and a quick 'top' showed nothing running. I was noticing all of the abnormal terminations in bigbird's LDM log file also, but I focused on the SIGTERM signal report by the lead rpc.ldmd process. The only way a SIGTERM can be reported is if one shuts down the LDM. >So, I executed a 'stop' and >'start' and data started flowing again. Serendipitous perhaps... but >the absense of running processes in top suggested it was hosed up again. OK. This explains the SIGTERM entry in the log file. >I'll continue to watch this and also see about getting one of my >students to research large file support in FC2. >I'll keep you posted. My gut feeling at the moment is that bigbird has some sort of a hardware problem. The reason I say this is that I rebuilt the LDM on the test machine in my office (dual 500 Mhz PIII running the most recent 32-bit FC2 kernel (2.6.8)) with large file support yesterday at noon. I then split its feed requests to match those on bigbird and setup 3 feeds off of the machine to another box here in the UPC. This machine is also processing all data except CONDUIT and CRAFT (I didn't setup enough disk space for this) with no errors/hiccups/complaints. I must point out that this machine differs from bigbird in several fundamental ways: - it is running the latest FC2 kernel without any serious errors - it does not have a RAID (it has a single 250 GB hard disk) - it only has 1 GB of RAM - its processors are not hyperthreaded Another reason that I suspect that bigbird has a hardware problem is your comment that you had show stopping problems when trying to run the latest FC2 kernel. We see some APIC errors in /var/log/messages, but not as frequently as you. Here is a listing of all APIC errors seen for today: Oct 7 00:02:03 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 00:38:23 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 00:52:52 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 00:55:12 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 00:57:12 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 00:58:32 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 01:00:42 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 01:01:52 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 01:02:42 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 01:34:32 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 01:34:52 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 02:29:41 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 02:58:20 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 03:02:40 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 03:04:00 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 03:11:20 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 03:24:00 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 03:34:40 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 03:39:40 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 03:49:00 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 03:55:20 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 04:13:19 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 04:15:19 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 04:44:29 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 05:40:08 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 06:14:48 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 06:26:27 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 06:30:47 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 06:49:57 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 07:13:37 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 07:37:16 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 07:51:16 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 08:34:25 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 08:46:25 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 08:51:05 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 08:52:55 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 08:58:55 dhcp9 kernel: APIC error on CPU0: 40(40) Oct 7 09:00:05 dhcp9 kernel: APIC error on CPU0: 40(40) None of these has caused any problems on the machine. So, where to now? I hate to say it, but it looks like bigbird may need some hardware doctoring. Cheers, Tom -- NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.