[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fwd: Re: sirocco, datoo, or nafhat?] (fwd)



Robert,

I found some problems, don't know what the affect they have on the
machine.  

- in bin/ldmadmin  the hostname is set to noaaport.unidata.ucar.edu, it
should be datoo.srcc.lsu.edu

- in /var/adm/messages most of the log is filled with ntpdate messages,
could be causing a problem

- in logs/ldmd.log files, found many entries of "Not a WMO message ..."
On our ingestor there were 155/day datoo had 650/day 936/day and 1035/day
I'm wondering maybe you have noise on the line to the receiver or poor
signal reception. Maybe the LDM is getting  a corrupt product causing it
to hang. I would like to see the LDM log entries when the system hangs. Is
pqing still running, what state is it in?

- Just for sanity sake, I would remake the queue to 540M instead of the
current 800M

- Curious about the lines in the ldmd.conf file?

exec    "/data/tmp/nwstg_mps0"
exec    "/data/tmp/goesw_mps1" 

At this point I didn't see anything major wrong, it looks like the SDI s/w
was properly install, etc.

Mike and I will look at your system again, soon.  In the mean time can you
supply more info about the system hangs.  

Thanks,
Robb...

On Thu, 11 Jan 2001, Robert Leche wrote:

> Hello Robb,
> 
> Sorry about that! The access control list should now  let you telnet.
> Acl is set to allow  *.unidata.ucar.edu.
> 
> Give it a try and let me know the results.
> 
> Thanks,
> 
> Bob
> address@hidden
> 225 388 5023
> 
> ----------------------------------------------------------------
> 
> Robert,
> 
> I tried to login a couple of times, I get:
> 
> laraine.unidata.ucar.edu.rkambic> telnet datoo.srcc.lsu.edu
> Trying 130.39.188.201...
> Connected to datoo.srcc.lsu.edu.
> Escape character is '^]'.
> Connection closed by foreign host
> 
> I tried both rlogin and telnet
> 
> Robb...
> 
> On Fri, 5 Jan 2001, Robert Leche wrote:
> 
> > Hello Robb,
> >
> >
> >
> > Let me know what find.
> >
> >
> > Bob
> > address@hidden
> > 225 388 5023
> >
> >
> >
> > Robb Kambic wrote:
> >
> > > Robert,
> > >
> > > I'm the maintainer of the SSEC ingest system at UPC and the go
> between
> > > for SSEC ingest support. None of our machines have the lockup
> problem that
> > > you are referring to in your message and I haven't heard of any
> others
> > > having the same problem. Since, I don't do support very often for
> the
> > > ingest, it would be hard for me to explain to you a solution in a
> email
> > > message. But, I would be willing to look at you machine if you give
> me a
> > > login so I can inspect the inge program. It appears that the machine
> has
> > > enough disk space and memory, so that rules out the easy solutions.
> Let me
> > > know if I can get a login.  Also, our sysadmin who originally
> configured
> > > the machine would be looking at datoo.
> > >
> > > Robb...
> > >
> > >   -------- Original Message --------
> > > Subject: Re: sirocco, datoo, or nafhat?
> > > Date: Thu, 04 Jan 2001 15:32:49 -0600
> > > From: Robert Leche <address@hidden>
> > > Organization: UCAR/Unidata
> > > To: Anne Wilson <address@hidden>
> > > References: <address@hidden>
> > > <address@hidden>
> > > <address@hidden>
> > >
> > > Anne,
> > > By radar issue, I am lamenting our  internal need of the  radar
> images.
> > >
> > > As I am not a SUNos expert, help with this would be great. When the
> sdi
> > > locks, I
> > > am unable to kill it. It becomes a defunct process. I have worked
> around
> > > this
> > > problem by systematically rebooting before the sdi lock shows up.
> When
> > > it happens
> > > again, I will send a 'top' and 'vmstat' data to you. In the mean
> time :
> > >
> > > > vmstat
> > >  procs     memory            page            disk          faults
> > > cpu
> > >  r b w   swap  free  re  mf pi po fr de sr f0 s0 s1 s2   in   sy
> cs us
> > > sy id
> > >  0 0 0  21872  1928   1 121 244 190 252 0 159 0 0 65 1  181  604
> 233
> > > 15  7 78
> > > >
> > >
> > > #  df -k
> > > Filesystem            kbytes    used   avail capacity  Mounted on
> > > /proc                      0       0       0     0%    /proc
> > > /dev/dsk/c1t0d0s0     122863   39590   70987    36%    /
> > > /dev/dsk/c1t0d0s3     492065  407387   35472    92%    /usr
> > > fd                         0       0       0     0%    /dev/fd
> > > /dev/dsk/c1t0d0s4     122863   18089   92488    17%    /var
> > > swap                  266272   51408  214864    20%    /tmp
> > > /dev/dsk/c1t0d0s7    1529888   99355 1369338     7%    /data
> > > /dev/dsk/c1t0d0s5     193416  107377   66698    62%    /home
> > > /dev/dsk/c1t0d0s6    1553912  922123  569633    62%    /opt
> > > /dev/dsk/c1t1d0s2    10606786 7958903 2541816    76%    /data1
> > > /dev/dsk/c1t1d0s4    21218376 10678244 10327949    51%    /data2
> > > /dev/dsk/c1t1d0s6    3462896       9 3428259     1%    /data3
> > >
> > > load averages:  0.25,  0.30,
> > > 0.30
> > > 15:10:03
> > > 56 processes:  54 sleeping, 1 zombie, 1 on cpu
> > > CPU states: 67.3% idle, 13.9% user,  8.2% kernel, 10.6% iowait,
> 0.0%
> > > swap
> > > Memory: 64M real, 1804K free, 79M swap in use, 211M swap free
> > >
> > >   PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
> > >   275 root       1  59    0 1580K  496K sleep 993:44  8.70% inge
> > >   372 root       1  59    0 8060K  644K sleep 127:23  1.17% Xsun
> > >   320 ldm        1  58    0  777M   18M sleep 106:30  0.78% pqing
> > >  9285 rleche     1  58    0 1788K  996K cpu     0:00  0.31% top
> > >   321 ldm        1  58    0  777M   16M sleep  23:33  0.25% pqact
> > >   313 ldm        1  58    0  777M   16M sleep  28:07  0.23%
> pqbinstats
> > >  2130 ldm        1  58    0  777M   16M sleep  25:42  0.21% rpc.ldmd
> 
> > >  2232 ldm        1  58    0  777M   16M sleep  23:32  0.19% rpc.ldmd
> 
> > > 22673 ldm        1  58    0  777M   16M sleep   4:32  0.16% rpc.ldmd
> 
> > >  7404 ldm        1  58    0  777M   16M sleep   0:19  0.12% rpc.ldmd
> 
> > >  9298 root       1 100  -20 1972K 1044K sleep   0:00  0.09% inge
> > >   316 ldm        1  59    0  778M   15M sleep  17:25  0.05% pqing
> > >   476 ldm        1  49    0  777M   10M sleep   7:11  0.04% pqutil
> > >     1 root       1  58    0  624K  104K sleep   2:48  0.01% init
> > > 22007 ldm        1  58    0  777M   16M sleep   2:12  0.01% rpc.ldmd
> 
> > >
> > > You will note X is running. Running X makes no difference . The sdi
> lock
> > > problem
> > > occurs regardless.
> > >
> > > Other clues:
> > > Lockup occurs regardless of LDM version.
> > > Lockup occurred before current version of sdi ingestor software. ( I
> 
> > > will have to
> > > research versions of sdi).
> > > Failure happens about every 4 to 5 days if left undisturbed.
> > >
> > > Anne, thanks for the help
> > >
> > > Bob
> > >
> > > Anne Wilson wrote:
> > >
> > > > Robert Leche wrote:
> > > > >
> > > > > Hello Anne,
> > > > > I am sorry about the delay in getting back to you, We where in
> the throws of
> > > > > rebuilding our main file server, email server.
> > > > >
> > > > > To answer your question of the server name, the answer is yes,
> or all the
> > > > > above. Pardon the small joke.
> > > > >
> > > > > The intention is to transition nafhat.srcc.lsu.edu to
> sirocco.srcc.lsu.edu
> > > > > next week. Sirocco will continue to be LSU's ldm server to other
> ldm clients.
> > > > > Old Sirocco ( A Sun box) will be retired and decommissioned.  A
> notice to the
> > > > > user group will be issued a day before the cut over.
> Tentatively, Wednesday
> > > > > is the cut over date, so look for the notice starting Tuesday.
> > > > >
> > > > > The Radar issue remains to be sorted out. I hope to have
> solutions in place
> > > > > by the time the encryption is removed (~Jan 14 from what I am
> reading on the
> > > > > ldm-user group).
> > > > >
> > > > > We have another important issue. I am not sure you are the
> correct point of
> > > > > contact for the NOAAport system. If you are, I need to report
> our NOAA port
> > > > > SUNOs system must be restarted every 4-5 days or the ingestion
> process dies
> > > > > and only a reboot restores operation when the sdi locks up.  I
> want to be
> > > > > able to depend on this machine though thick and thin and as it
> is, I can not
> > > > > depend on it.
> > > > >
> > > > > Bob
> > > > >
> > > >
> > > > Hi Bob,
> > > >
> > > > I'm not sure what you mean by "the radar issue".  Is it a question
> of
> > > > which machine you want to use to serve the radar data?  Currently
> the
> > > > two sites that were supposed to get radar data from you are
> feeding from
> > > > our server, motherlode.  This arrangement was intended to be
> temporary,
> > > > but so far it's working fine and we can keep it up for a little
> while.
> > > >
> > > > And, the "published" date that the unencrypted, compressed data
> will be
> > > > available is January 10, although I suppose the possibility exists
> that
> > > > that could change yet again.  Maybe you're thinking of January 14
> > > > because that's when AMS starts and people are thinking that NOAA
> wants
> > > > to show off the radar distribution at the conference.
> > > >
> > > > Although I don't know much about the NOAAPORT system, I talked
> with Robb
> > > > and Mike (our sys admin) about it.  They said you should make sure
> you
> > > > enough disk space and RAM.  (The 'top' command can tell you about
> the
> > > > RAM usage.  If you don't have that command, vmstat also gives
> > > > statistics.)
> > > >
> > > > If you want someone to take a look at your machine, Robb is
> willing to
> > > > do so.  I've cc'ed him on this message - you can continue this
> thread
> > > > with him.
> > > >
> > > > Thanks for responding!
> > > >
> > > > Anne
> > > > ***************************************************
> > > > Anne Wilson                     UCAR Unidata Program
> > > > address@hidden                  P.O. Box 3000
> > > >                                   Boulder, CO  80307
> > > > ----------------------------------------------------
> > > > Unidata WWW server       http://www.unidata.ucar.edu/
> > > > ****************************************************
> >
> >
> 
> ===============================================================================
> 
> Robb Kambic                                Unidata Program Center
> Software Engineer III                      Univ. Corp for Atmospheric
> Research
> address@hidden                   WWW:
> http://www.unidata.ucar.edu/
> ===============================================================================
> 
> 
> 
> 
> 

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================