This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Gabe, >Date: Mon, 14 Feb 2005 13:47:01 -0500 (EST) >From: Gabe Langbauer <address@hidden> >To: Steve Emmerson <address@hidden> >Subject: Re: 20050214: LDM product queue corruption The above message contained the following: > I am unsure where this ldmping initiated from. My ldm crontab is as > follows: > 35 * * * * /usr/local/ldm/bin/ldmadmin dostats > 0 0 * * * /usr/local/ldm/bin/ldmadmin newlog "ldmadmin dostats", eh? That command is no longer useful. I don't think it could affect a running LDM, but, just to be sure, do the following: 1. Remove the "ldmadmin dostats" command from the LDM user's crontab(1) file. 2. Have the following entry enabled in the LDM configuration-file, etc/ldmd.conf: exec "pqbinstats" The pqbinstats(1) program saves statistics on the LDM system in *.stats files in the LDM user's "logs" subdirectory. 2. Add the following entry to the scour(1) configuration-file, etc/scour.conf: ~ldm/logs 1 *.stats This ensures that the number of *.stats files won't increase indefinitely. > # Check for incoming data and failover if upstream site is dead > #10,30,50 * * * * /usr/local/ldm/bin/ldmfail -p stokes.metr.ou.edu -f > pluto.met.fsu.edu > /dev/null 2>&1 /dev/null > > # Scour the data directories > 0 * * * * /usr/local/ldm/bin/ldmadmin scour > /dev/null 2>&1 > > # Rotate and remove the decoder logs - the trailing digit > # tells the script how many days of logs to keep > # > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcacars.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcamos.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcmmos.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcnmos.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcnldn.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcncprof.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dctrop.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcwatch.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcffg.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcstorm.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcgrib.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dchrly.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcsynop_sb.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcsynop_syn.log 1 > 0 0 * * * /usr/local/ldm/bin/newlog data/logs/dcuair.log 1 > > So, the only things going on here are rotating logs and some stats. A > check of my gempak crontab (ldm and gempak are virtually the only things > running on the machine) shows nothing occuring at ~20:43 except > scripts that are called at the same time every hour or possibly > ngm.csh which is called at 20:00 ngm.csh simply is a script that calls > other ngm scripts to create gempak products. we source the Gemenviron and > set the display then use the 'date' command to get the current time then > run gempak. Nowhere is there any mention of ldm nor do I believe it would > have permissions to make a call such as ldmping You've got to find-out where that ldmping(1) came from to ensure that whatever's causing it isn't interfering in other ways with the LDM. > Another interesting development occured this weekend. I was able to > "capture ldm in the act". LDM crashed around 00:15 UTC and I > realized that it was down. Can you send me the log entries for that time? > I ssh'd in and issued the command ldmadmin > clean. Did you ensure that the LDM system wasn't running? Doing an "ldmadmin clean" when the LDM is running will cause the *.pid file to be removed and could result in "orphaned" LDM processes. > This commmand successfully completed. I then issued the commmand > ldmadmin start this command appeared to work correctly. However, when I > issued ldmadmin watch I was given the message "there is no ldm running on > this machine" That's mighty suspicious. Can you send me the log entries for that time? > I tried this same sequence a couple more times and I > delqueued and mkqueued and physically removed (via rm) the pid file and so > forth. LDM however refused to start. Immediatly at 01:00 UTC I issued > the same command ldmadmin clean && ldmadmin start as I had done several > times during the previous hour. Magically, this time it worked. This > leads me to believe that there was some program running at that time that > immediatly corrupted the ldm. But I'm unsure what could be responsible Any chance of my logging onto the system in question to examine the LDM setup? Regards, Steve Emmerson