This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>From: Ben Cotton <address@hidden> >Organization: Purdue >Keywords: 200601051847.k05IlF7s014054 LDM core Hi Ben, re: what OS version are you running >[ldm@weather ~]$ uname -a >Linux weather.eas.purdue.edu 2.6.9-22.ELhugemem #1 SMP Mon Sep 19 18:43:10 >EDT 2005 i686 i686 i386 GNU/Linux OK, thanks. Is your kernel/OS uptodate with respect to patches and upgrades? I ask because our Fedora Core 3 Linux machines are running the 2.6.12-xxx kernel. >I'm unsure of the hardware specifics. I do know its quite a hefty >machine...it is the machine we got with the Unidata equipment grant in >'05. I think it has 1GB of RAM... You can get specifics on the CPU(s) and memory on a Linux machine as follows: cat /proc/cpuinfo cat /proc/meminfo re: What 'interrupt' (signal)? is being seen >I don't know, all I can tell is from the log (attached) entries like: >Jan 05 06:18:10 pqact[32619] NOTE: Interrupt OK. Your log file listing makes it look like pqact is being told to exit by the lead rpc.ldmd process. I say this because you are getting a core dump of rpc.ldmd, and all LDM processes started out of ~ldm/etc/ldmd.conf belong to the same process group. When any one of the processes error exits (like from a segmentation violation) a signal is sent to the group, and all processes will exit. re: what is the result of 'file core.nnnnn' >[ldm@weather ~]$ file core.18872 >core.18872: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), >SVR4-style, SVR4-style, from 'rpc.ldmd' > >[ldm@weather ~]$ file core.32629 >core.32629: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), >SVR4-style, SVR4-style, from 'rpc.ldmd' This is interesting since dumping of core files from setuid root programs is turned off by default in Linux. In order to get the core file, someone would have had to enable core dumping _if_ rpc.ldmd is, in fact, running with setuid root privilege. So, the question is if the LDM was installed so that rpc.ldmd and hupsyslog have setuid root privilege. Check this with: <as 'ldm'> cd ~ldm ls -alt bin/rpc.ldmd ls -alt bin/hupsyslog re: your other machine is not showing the exit problem; is it running the same version of the LDM >No, wxp.eas is running an older Linux kernel... > >Linux wxp.eas.purdue.edu 2.4.22-1.2199.nptl #1 Wed Aug 4 12:21:48 EDT 2004 >i686 i686 i386 GNU/Linux OK. Since the machines are running different OS versions, doing a comparison between them (i.e., never seen the exiting problem on the other machine) is not very useful. There were some important updates in the 2.6 Linux kernel after the 2.6.9 version. It may be useful for you to investigate upgrading your OS kernel if one is available (you are running RedHat Enterprise, correct?). >Thanks, No worries. For Steve: Purdue did not turn off SELINUX, so they are not logging to ~ldm/etc/ldmd.log in the usual way. Ben's original email reported: >My LDM 6.4.2 build on weather.eas.purdue.edu has developed the nasty habit >of dying unexpectedly. There's been no pattern that I've been able to >determine, except that it generally happens overnight in oder to make sure >I don't catch it for hours. I've asked our department computing support >staff to check the system logs for anything that might be a trigger, since >the ldmd.log contains very little information... >(and in a bit of extra fun, for some reason >after I manually rotated the logs - cron isn't working properly for some >reason, long story - the new ldmd.log file remained empty while entries >were being written to ldmd.log-1 ). >A core dump appears >in ~ldm at the same time as the LDM dies, and I assume the two are >related, but I don't know how to do anything with core files. >Our other >machine, wxp.eas.purdue.edu, is running 6.4.1 (although I'm building >6.4.4 on both this afternoon) and has never had this problem. >I'm also noticing a what seems like a lack of information in the logs. >The only messages that are being written or the WARNs that a write to >pipe took x number of seconds. I've checked /etc/syslog.conf , >~/etc/ldmadmin-pl.conf and the pqact entries in ~/etc/ldmd.conf and >everything points to /var/log/ldm/ldmd.log . We put the logs there >instead of ~/logs (which I set as a symling to /var/log/ldm ) to skirt the >SELINUX issue. Here is the output from Ben's ldmd.log.1 file: Jan 03 19:46:41 pqact[32614] NOTE: Starting Up Jan 03 19:46:41 pqact[32615] NOTE: Starting Up Jan 03 19:46:41 pqact[32616] NOTE: Starting Up Jan 03 19:46:41 pqact[32617] NOTE: Starting Up Jan 03 19:46:41 pqact[32618] NOTE: Starting Up Jan 03 19:46:41 pqact[32619] NOTE: Starting Up Jan 03 19:51:34 pqact[32619] WARN: write(11,,4096) to pipe took 12.455922 s Jan 03 19:54:06 pqact[32616] WARN: write(6,,4096) to pipe took 2.113214 s Jan 03 19:57:45 pqact[32616] WARN: write(6,,4096) to pipe took 2.708293 s Jan 03 19:57:52 pqact[32619] WARN: write(17,,4096) to pipe took 4.661930 s Jan 03 20:00:55 pqact[32619] WARN: write(7,,4096) to pipe took 10.178098 s ... Jan 04 23:10:17 pqact[32619] WARN: write(8,,4096) to pipe took 5.363954 s Jan 05 02:10:53 pqact[32619] WARN: write(15,,4096) to pipe took 2.056173 s Jan 05 06:18:10 pqact[32614] NOTE: Interrupt Jan 05 06:18:10 pqact[32616] NOTE: Interrupt Jan 05 06:18:10 pqact[32614] NOTE: Exiting Jan 05 06:18:10 pqact[32616] NOTE: Exiting Jan 05 06:18:10 pqact[32615] NOTE: Interrupt Jan 05 06:18:10 pqact[32615] NOTE: Exiting Jan 05 06:18:10 pqact[32618] NOTE: Interrupt Jan 05 06:18:10 pqact[32618] NOTE: Exiting Jan 05 06:18:10 pqact[32617] NOTE: Interrupt Jan 05 06:18:10 pqact[32617] NOTE: Exiting Jan 05 06:18:10 pqact[32619] NOTE: Interrupt Jan 05 06:18:10 pqact[32619] NOTE: Exiting Cheers, Tom -- NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.