This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
"Bryan G. White" wrote: > > > I have a couple of things I'd like to check out. I will start the LDM > > and let it run for a while and watch it. If you see activity, it's me. > > > > Have you changed what data you're requesting, or would the stream have > > increased for any other reason? > > We made some changes before I upgraded to new OS. Everything seemed > to go fine for a week or so. I did remove a site after the upgraded. Hi Bryan, I found several things to report to you. First, log messages are being written to /var/log/infolog, as per the local0 entry in /etc/syslog.conf. The good news is that I could see the log messages which gave me a clue about the problem. But, I recommend that you change your syslog.conf file so that LDM log messages are written to the log files in ~ldm/logs. Maybe you already know how to do this, but just in case, there are instructions for this in http://www.unidata.ucar.edu/packages/ldm/ldmPreInstallList.html#s8 under "Configuring the Operating System as root". Second, you're installation is nonstandard. The home of user 'ldm' is /home/ldm, but there's an extra subdirectory 'ldm', yielding a path of /home/ldm/ldm to the standard files and directories. But, some of those directories are duplicated under /home/ldm. For example, there's /home/ldm/logs and /home/ldm/ldm/logs. Similar for the 'data' directory. I would straighten that out. The same web page as above has instructions about the conventional structure - see: http://www.unidata.ucar.edu/packages/ldm/ldmPreInstallList.html#s14. But these aren't the _real_ problem. In the logs I found the following: Jul 10 13:20:22 met20.slc.noaa.gov cirp[17711]: comings: pqe_new: Not enough space Jul 10 13:20:22 met20.slc.noaa.gov cirp[17711]: : 57dc6d0af3b225993daf4335e38e7604 13379545 20010710130514.165 EXP 000 ens_010710_00_ Jul 10 13:20:22 met20.slc.noaa.gov cirp[17711]: Connection reset by peer Jul 10 13:20:22 met20.slc.noaa.gov cirp[17711]: Disconnect Jul 10 13:20:25 met20.slc.noaa.gov rpc.ldmd[17707]: child 17712 terminated by signal 11 Jul 10 13:20:25 met20.slc.noaa.gov rpc.ldmd[17707]: Killing (SIGINT) process group Jul 10 13:20:25 met20.slc.noaa.gov rpc.ldmd[17707]: Interrupt Jul 10 13:20:25 met20.slc.noaa.gov rpc.ldmd[17707]: Exiting Jul 10 13:20:25 met20.slc.noaa.gov voyager(feed)[17857]: Interrupt Jul 10 13:20:25 met20.slc.noaa.gov voyager(feed)[17857]: Exiting Jul 10 13:20:25 met20.slc.noaa.gov cirp(feed)[17760]: Interrupt Jul 10 13:20:25 met20.slc.noaa.gov cirp(feed)[17760]: Exiting Jul 10 13:20:25 met20.slc.noaa.gov cirp[17711]: Interrupt Jul 10 13:20:25 met20.slc.noaa.gov cirp[17711]: Exiting Jul 10 13:20:25 met20.slc.noaa.gov pqact[17710]: Interrupt Jul 10 13:20:25 met20.slc.noaa.gov pqact[17710]: Exiting Jul 10 13:20:25 met20.slc.noaa.gov pqbinstats[17709]: Interrupt Jul 10 13:20:25 met20.slc.noaa.gov pqbinstats[17709]: Exiting Jul 10 13:20:25 met20.slc.noaa.gov pqexpire[17708]: Interrupt Jul 10 13:20:25 met20.slc.noaa.gov pqexpire[17708]: Exiting Jul 10 13:20:25 met20.slc.noaa.gov pqexpire[17708]: > Up since: 20010709161608.400 Jul 10 13:20:25 met20.slc.noaa.gov pqexpire[17708]: > Queue usage (bytes):10002432 Jul 10 13:20:25 met20.slc.noaa.gov pqexpire[17708]: > (nregions): 2436 Jul 10 13:20:25 met20.slc.noaa.gov pqexpire[17708]: > nprods deleted 0 This "not enough space" problem caused everything to die. From previous log entries I see that you are getting lots of very small products. I think you ran out of "slots" in your queue. The total number of slots for products is the size of the queue divided by the average product size, which is by default assumed to be 4096 bytes, so with your 10Mb queue you'll be able to handle at most 2441 products. I restarted your ldm and then started the pqmon program to monitor the queue. Here's what it said: 48met20% pqmon -i5 Jul 11 19:16:17 pqmon: Starting Up (3051) Jul 11 19:16:17 pqmon: nprods nfree nempty nbytes maxprods maxfree minempty maxext age Jul 11 19:16:17 pqmon: 2431 6 4 2839544 2436 10 4 6632488 81 Jul 11 19:16:22 pqmon: 2431 6 4 2839544 2436 10 4 6632488 86 Jul 11 19:16:27 pqmon: 2431 6 4 2839544 2436 10 4 6632488 91 Jul 11 19:16:32 pqmon: 2432 5 4 2845944 2436 10 4 6632488 96 ... The 'nprods' column tells me that you had almost reached the limit on the number of products you could store. (That 2441 number is theoretical and may actually never be reached due to overhead.) Your LDM is running now - although I thought it might reach the limit quickly, it didn't. Instead space is being recycled and so far everything's ok. I suggest that you let it run, and at the same time run pqmon and have it log to a file. Then, if and when it crashes, you may be able to correlate the crash with running out of slots as reported by pqmon and confirm this diagnosis. For information about pqmon and how to have it log to a file see the pqmon man page. It will create a very large file - probably you want to clear it out or rotate it regularly. You can change the default number of slots using the -S option to rpc.ldmd. To do this, I suggest you modify the ldmadmin script. In ldmadmin, in the subroutine make_pq find the section that looks like this: # build the command line $cmd_line = "pqcreate"; if ($verbose) { $cmd_line .= " -v"; } if ($pq_clobber) { $cmd_line .= " -c"; } if ($pq_fast) { $cmd_line .= " -f"; } $cmd_line .= " -q $pq_path -s $pq_size"; and change that last line to $cmd_line .= " -S <slots> -q $pq_path -s $pq_size"; where <slots> is an appropriate number. What's an appropriate number? In looking at your logs, I see lots of tiny products, many less than 100 bytes. The best way would be to get the average product size and divide that into 10000000. Baring that, you could start with simply, say, 5000, which implies average product size is about half of the default. There is another subroutine in ldmadmin, make_surf_pq, that also uses pqcreate. But, I think you can leave that one alone since it doesn't look like you're using pqsurf. The only potential problem with modifying ldmadmin is that whenever you upgrade you must remember to duplicate this change. This is something that people often forget to do. One last thing. From the log entries above I see that you were running pqexpire. Starting with version 5.1 we recommend against running pqexpire. For that reason, I commented that line out of your ldmd.conf file before starting up the ldm, so you won't see pqexpire running anymore. Please let me know if any of this is unclear or if you have any further questions. Anne -- *************************************************** Anne Wilson UCAR Unidata Program address@hidden P.O. Box 3000 Boulder, CO 80307 ---------------------------------------------------- Unidata WWW server http://www.unidata.ucar.edu/ ****************************************************