[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Thelma Down?
- Subject: Re: Thelma Down?
- Date: Tue, 13 Jun 2000 10:28:50 -0600 (MDT)
On Tue, 13 Jun 2000, Jason J. Levit wrote:
>
> > Yes thelma was down at 5:23 this morning. The crash was caused by a known
> > problem on SGI machines. If the LDM queue is growing while pqexpire is
> > running it creates a corrupt queue. At this point, I have remade the queue
> > and restarted the LDM. I'll recalculate what the queue size should be
> > now, new products coming over noaaport. Then later today, I'll implement
> > the new queue size. UPC is in the process of replacing thelma hardware and
> > we have a new version of the LDM software that doesn't have this problem.
> > It will soon be installed on thelma eliminating this problem.
> >
> > Thanks for the patience,
> > Robb...
>
> Hi Robb,
>
> I've been having severe problems with LDM crashing on our Origin 200
> machine, and this might explain it! LDM will literally die every few
> minutes from time to time when incoming traffic gets high. Let me see
> if this scenario sounds familiar: LDM dies for no apparent reason, the
> log file just says "interrupt" for all the processes, and a huge core
> file is dumped. Was that the behavior you were seeing?
>
Jason,
It sounds like this could be your problem. log entries:
Jun 13 05:23:37 5Q:thelma nport(feed)[9491]: RECLASS: 20000613042337.506
TS_ENDT
{{WMO, ".*"}}
Jun 13 05:23:43 5Q:thelma rpc.ldmd[6873]: child 6846 terminated by signal
11
Jun 13 05:23:43 5Q:thelma rpc.ldmd[6873]: Killing (SIGINT) process group
Jun 13 05:23:43 5Q:thelma rpc.ldmd[6873]: Interrupt
Jun 13 05:23:43 5Q:thelma nport(feed)[9491]: Interrupt
Jun 13 05:23:43 5Q:thelma snow(feed)[10378]: Interrupt
Jun 13 05:23:43 5Q:thelma iita(feed)[10370]: Interrupt
Jun 13 05:23:43 5Q:thelma ofour(feed)[9459]: Interrupt
Jun 13 05:23:45 5Q:thelma unidata[6899]: Exiting
Jun 13 05:23:45 5Q:thelma pqexpire[6851]: Exiting
Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > Up since:
20000610153554.787
Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > Queue usage (bytes):285161608
Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > (nregions): 29879
Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > nbytes recycle: 3984792280 (
63311
.904 kb/hr)
Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > nprods deleted: 656340 (
10678
.457 per hour)
Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > First deleted:
20000610143555.011
Jun 13 05:23:45 5Q:thelma pqexpire[6851]: > Last deleted:
20000613040345.174
Jun 13 05:23:45 5Q:thelma ldm[6882]: Interrupt
Jun 13 05:23:45 5Q:thelma ldm[6882]: Exiting
Jun 13 05:23:45 5Q:thelma rpc.ldmd[6873]: Terminating process group
> How did you calculate the appropriate queue size? I suppose I could
> just keep increasing it until the problem doesn't exist anymore...
>
The queue size depends on the feeds you are receiving, for thelma it
receives NOAAport, McIdas, FSL2 and the queue size set in bin/ldmadmin is
set to:
$pq_size = 250000000;
I would take the peak data rates on the feeds, combine them and add 10%
for the queue size. You should check the ldmd.log files for messages
similar to:
Growing data by <size>
If you see these messages then the queue is too small.
Robb...
> Jason
>
> --
> ----------------------------------------------------------------------------
> Jason J. Levit, N9MLA Research Scientist,
> address@hidden Center for Analysis and Prediction of
> Storms
> Room 1014 University of Oklahoma
> 405/325-3503 http://www.caps.ou.edu/
>
===============================================================================
Robb Kambic Unidata Program Center
Software Engineer III Univ. Corp for Atmospheric Research
address@hidden WWW: http://www.unidata.ucar.edu/
===============================================================================