This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
=============================================================================== Robb Kambic Unidata Program Center Software Engineer III Univ. Corp for Atmospheric Research address@hidden WWW: http://www.unidata.ucar.edu/ =============================================================================== ---------- Forwarded message ---------- Date: Tue, 14 Dec 1999 00:00:05 -0500 From: The AWIPS LDM list digest <address@hidden> To: awipsldm digest recipients <address@hidden> Subject: awipsldm digest: December 13, 1999 Digest for AwipsLDM The AWIPS LDM list Digest for Monday, December 13, 1999. 1. Re: awipsldm digest: December 02, 1999 ---------------------------------------------------------------------- Subject: Re: awipsldm digest: December 02, 1999 From: Ken Waters <address@hidden> Date: 13 Dec 1999 12:31:04 -0500 X-Message-Number: 1 Just to follow up on our frequent LDM stoppages... We still periodically receive the "signal 11" terminations of the program. These events seem to be tied to the connection to the same site each time. Although we have not determined the cause of the terminations, we have at least installed a script (courtesy of Western Region SSD) which checks to see if the LDM is running and if not, restarts it. At least that keeps us in business. I still would like to know what is causing these problems. I have eliminated the possibility that it is related to a disk filling up or to conflict with CPU usage. I did check with the site on the other end and it turns out they are running version 5.0.6. Based on my conversation, they will try installing 5.0.8 (the version we are running) to see if that makes a difference. Ken Waters Southern Region SSD ______________________________ Reply Separator _________________________________ Subject: [awipsldm] Re: awipsldm digest: December 02, 1999 Author: address@hidden at EXTERNAL Date: 12/3/1999 2:26 PM AwipsLDM List On 3 Dec 1999, Ken Waters wrote: > > Thanks for the reply, Robb. > > I've done your suggestions and will keep an eye on the system. > > A couple of items: > > - I don't think disk space is a problem. I checked it and didn't see > any full or near-full directories. I set up a temporary cron job, > though, to run through the night for a day or two and will look at its > output just to be sure. > > - It happened again yesterday afternoon and the log file shows the > same symptom...a kill signal #11 sent to the program right after a > connection was reset by peer. Interestingly, it was from the same IP > both times it was disconnected. The pertinent lines from the log file > are: > > (note I made two modifications to this log text...(1) for security I > replaced the actual IP with "(IP)", and (2) I deleted all the > Interrupt messages from the different clients) > > Dec 02 22:15:01 ls1-ehu pqexpire[8873]: > Recycled 2545.879 kb/hr ( > 162.787 prods per hour) > Dec 02 22:16:27 ls1-ehu (IP)[8877]: Connection reset by peer > Dec 02 22:16:57 ls1-ehu (IP)[8877]: run_requester: 19991202221506.017 > TS_ENDT {{ANY, ".*"}} > Dec 02 22:17:03 ls1-ehu rpc.ldmd[8872]: child 8877 terminated by > signal 11 > Ken, This is reaching but I have some sites that have problems when their site interacts with sites using a different version of the LDM, they lose data, have trouble connecting, etc. I might be worth while to have all the sites running the same version of the LDM. I only bring this up because the run_requester seems to be the process bringing down the whole LDM. > ..[a series of Interrupt messages from each of the connected sites]... > > Dec 02 22:17:03 ls1-ehu pqact[8874]: Interrupt > Dec 02 22:17:03 ls1-ehu pqexpire[8873]: Interrupt > Dec 02 22:17:03 ls1-ehu pqexpire[8873]: > First deleted: > 19991201092529.238 > Dec 02 22:17:03 ls1-ehu rpc.ldmd[8872]: Interrupt > Dec 02 22:17:06 ls1-ehu rpc.ldmd[8872]: Terminating process group > > Is it not advisable to set a cron job to recyle (stop-start) the ldm > on a regular basis? I realize it's not the best solution, but at > least it will keep the system running through the night. Another test would be to eliminate some of the request to different sites to find which one is the culprit. Then have that site update the LDM version as stated above. > > I also will always start the ldm in verbose mode from now on...at > least until the problem is solved. > I forgot to warn you that verbose logging in the LDM can make the logs huge and cause disk space problems. Robb... > Thanks for your help. > > Ken > > > ______________________________ Reply Separator _________________________________ > Subject: [awipsldm] Re: awipsldm digest: December 02, 1999 > Author: address@hidden at EXTERNAL > Date: 12/3/99 10:39 AM > > > AwipsLDM List > On Fri, 3 Dec 1999, The AWIPS LDM list digest wrote: > > > Digest for AwipsLDM > > The AWIPS LDM list Digest for Thursday, December 02, 1999. > > > > 1. Frequent Unexpected Stoppages of LDM --- END OF DIGEST --- You are currently subscribed to awipsldm as: address@hidden To unsubscribe send a blank email to address@hidden