This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Gabe, >Date: Mon, 14 Feb 2005 15:21:52 -0500 (EST) >From: Gabe Langbauer <address@hidden> >Organization: Ohio State University >To: Steve Emmerson <address@hidden> >Subject: Re: 20050214: LDM product queue corruption The above message contained the following: > The original log is attached, note there is no ldmping issue on this log, > it seems to die with a rpc.ldmd error...and there is a mention of rtstats. > I don't know if those are the stats from "do stats" Everytime subsequent > time I issued the start command I got this log (although times were > different): > > Feb 12 23:24:21 twister ldmping[10477]: SVC_UNAVAIL 0.000601 0 > localhost RPC: Program not registered > Feb 12 23:24:21 twister pqcheck[10481]: Starting Up (10472) > Feb 12 23:24:21 twister pqcheck[10481]: The writer-counter of the > product-queue is 0 > Feb 12 23:24:21 twister pqcheck[10481]: Exiting The above are OK. The "ldmping" entry is from the ldmadmin(1) script testing to see if an LDM is already running. The pqcheck(1) entries are from the same script checking to see that the product-queue is OK. > I agree, mighty suspicious indeed. Logs above The end of the logfile contained this Feb 12 22:58:54 twister rpc.ldmd[791]: child 793 terminated by signal 25 Process 793 was a pqact(1) process: $ fgrep '[793]' ldmd.log.4 Feb 12 07:02:16 twister pqact[793]: child 569 exited with status 1 Feb 12 07:58:21 twister pqact[793]: child 16497 exited with status 1 Feb 12 21:12:23 twister pqact[793]: child 11341 exited with status 1 Feb 12 22:30:00 twister pqact[793]: pbuf_flush (3) write: Broken pipe and was, undoubtably, started via an EXEC entry in the LDM configuration-file, etc/ldmd.conf. The LDM server exits when an EXEC-ed child process terminates abnormally due to a seriously bad signal (e.g., SIGSEGV). Oddly, on my system, signal 25 is SIGCONT and should not cause the pqact(1) process to terminate. What is it on your system? One can work-around this behavior by wrapping EXEC-ed programs in a shell-script that ensures that their abnormal termination is never seen by the LDM, e.g., $ cat util/execWrapper while true do "$@" logger -p local0.notice "Restarting: $@" done (The above is off-the-top-of-my-head and might need modification.) The relevant EXEC entry is then replaced with EXEC "execWrapper prog a1 a2" (assuming the script is in the "util/" subdirectory and is executable). Regards, Steve Emmerson