This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
On Tue, 13 Jun 2000, Jason J. Levit wrote: > > > Yes thelma was down at 5:23 this morning. The crash was caused by a known > > problem on SGI machines. If the LDM queue is growing while pqexpire is > > running it creates a corrupt queue. At this point, I have remade the queue > > and restarted the LDM. I'll recalculate what the queue size should be > > now, new products coming over noaaport. Then later today, I'll implement > > the new queue size. UPC is in the process of replacing thelma hardware and > > we have a new version of the LDM software that doesn't have this problem. > > It will soon be installed on thelma eliminating this problem. > > > > Thanks for the patience, > > Robb... > > Hi Robb, > > I've been having severe problems with LDM crashing on our Origin 200 > machine, and this might explain it! LDM will literally die every few > minutes from time to time when incoming traffic gets high. Let me see > if this scenario sounds familiar: LDM dies for no apparent reason, the > log file just says "interrupt" for all the processes, and a huge core > file is dumped. Was that the behavior you were seeing? > Jason, It sounds like this could be your problem. log entries: Jun 13 05:23:37 5Q:thelma nport(feed)[9491]: RECLASS: 20000613042337.506 TS_ENDT {{WMO, ".*"}} Jun 13 05:23:43 5Q:thelma rpc.ldmd[6873]: child 6846 terminated by signal 11 Jun 13 05:23:43 5Q:thelma rpc.ldmd[6873]: Killing (SIGINT) process group Jun 13 05:23:43 5Q:thelma rpc.ldmd[6873]: Interrupt Jun 13 05:23:43 5Q:thelma nport(feed)[9491]: Interrupt Jun 13 05:23:43 5Q:thelma snow(feed)[10378]: Interrupt Jun 13 05:23:43 5Q:thelma iita(feed)[10370]: Interrupt Jun 13 05:23:43 5Q:thelma ofour(feed)[9459]: Interrupt Jun 13 05:23:45 5Q:thelma unidata[6899]: Exiting Jun 13 05:23:45 5Q:thelma pqexpire[6851]: Exiting Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > Up since: 20000610153554.787 Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > Queue usage (bytes):285161608 Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > (nregions): 29879 Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > nbytes recycle: 3984792280 ( 63311 .904 kb/hr) Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > nprods deleted: 656340 ( 10678 .457 per hour) Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > First deleted: 20000610143555.011 Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > Last deleted: 20000613040345.174 Jun 13 05:23:45 5Q:thelma ldm[6882]: Interrupt Jun 13 05:23:45 5Q:thelma ldm[6882]: Exiting Jun 13 05:23:45 5Q:thelma rpc.ldmd[6873]: Terminating process group > How did you calculate the appropriate queue size? I suppose I could > just keep increasing it until the problem doesn't exist anymore... > The queue size depends on the feeds you are receiving, for thelma it receives NOAAport, McIdas, FSL2 and the queue size set in bin/ldmadmin is set to: $pq_size = 250000000; I would take the peak data rates on the feeds, combine them and add 10% for the queue size. You should check the ldmd.log files for messages similar to: Growing data by <size> If you see these messages then the queue is too small. Robb... > Jason > > -- > ---------------------------------------------------------------------------- > Jason J. Levit, N9MLA Research Scientist, > address@hidden Center for Analysis and Prediction of > Storms > Room 1014 University of Oklahoma > 405/325-3503 http://www.caps.ou.edu/ > =============================================================================== Robb Kambic Unidata Program Center Software Engineer III Univ. Corp for Atmospheric Research address@hidden WWW: http://www.unidata.ucar.edu/ ===============================================================================