[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Fwd: Re: sirocco, datoo, or nafhat?] (fwd)
- Subject: Re: [Fwd: Re: sirocco, datoo, or nafhat?] (fwd)
- Date: Fri, 12 Jan 2001 16:44:54 -0700 (MST)
Robert,
I found some problems, don't know what the affect they have on the
machine.
- in bin/ldmadmin the hostname is set to noaaport.unidata.ucar.edu, it
should be datoo.srcc.lsu.edu
- in /var/adm/messages most of the log is filled with ntpdate messages,
could be causing a problem
- in logs/ldmd.log files, found many entries of "Not a WMO message ..."
On our ingestor there were 155/day datoo had 650/day 936/day and 1035/day
I'm wondering maybe you have noise on the line to the receiver or poor
signal reception. Maybe the LDM is getting a corrupt product causing it
to hang. I would like to see the LDM log entries when the system hangs. Is
pqing still running, what state is it in?
- Just for sanity sake, I would remake the queue to 540M instead of the
current 800M
- Curious about the lines in the ldmd.conf file?
exec "/data/tmp/nwstg_mps0"
exec "/data/tmp/goesw_mps1"
At this point I didn't see anything major wrong, it looks like the SDI s/w
was properly install, etc.
Mike and I will look at your system again, soon. In the mean time can you
supply more info about the system hangs.
Thanks,
Robb...
On Thu, 11 Jan 2001, Robert Leche wrote:
> Hello Robb,
>
> Sorry about that! The access control list should now let you telnet.
> Acl is set to allow *.unidata.ucar.edu.
>
> Give it a try and let me know the results.
>
> Thanks,
>
> Bob
> address@hidden
> 225 388 5023
>
> ----------------------------------------------------------------
>
> Robert,
>
> I tried to login a couple of times, I get:
>
> laraine.unidata.ucar.edu.rkambic> telnet datoo.srcc.lsu.edu
> Trying 130.39.188.201...
> Connected to datoo.srcc.lsu.edu.
> Escape character is '^]'.
> Connection closed by foreign host
>
> I tried both rlogin and telnet
>
> Robb...
>
> On Fri, 5 Jan 2001, Robert Leche wrote:
>
> > Hello Robb,
> >
> >
> >
> > Let me know what find.
> >
> >
> > Bob
> > address@hidden
> > 225 388 5023
> >
> >
> >
> > Robb Kambic wrote:
> >
> > > Robert,
> > >
> > > I'm the maintainer of the SSEC ingest system at UPC and the go
> between
> > > for SSEC ingest support. None of our machines have the lockup
> problem that
> > > you are referring to in your message and I haven't heard of any
> others
> > > having the same problem. Since, I don't do support very often for
> the
> > > ingest, it would be hard for me to explain to you a solution in a
> email
> > > message. But, I would be willing to look at you machine if you give
> me a
> > > login so I can inspect the inge program. It appears that the machine
> has
> > > enough disk space and memory, so that rules out the easy solutions.
> Let me
> > > know if I can get a login. Also, our sysadmin who originally
> configured
> > > the machine would be looking at datoo.
> > >
> > > Robb...
> > >
> > > -------- Original Message --------
> > > Subject: Re: sirocco, datoo, or nafhat?
> > > Date: Thu, 04 Jan 2001 15:32:49 -0600
> > > From: Robert Leche <address@hidden>
> > > Organization: UCAR/Unidata
> > > To: Anne Wilson <address@hidden>
> > > References: <address@hidden>
> > > <address@hidden>
> > > <address@hidden>
> > >
> > > Anne,
> > > By radar issue, I am lamenting our internal need of the radar
> images.
> > >
> > > As I am not a SUNos expert, help with this would be great. When the
> sdi
> > > locks, I
> > > am unable to kill it. It becomes a defunct process. I have worked
> around
> > > this
> > > problem by systematically rebooting before the sdi lock shows up.
> When
> > > it happens
> > > again, I will send a 'top' and 'vmstat' data to you. In the mean
> time :
> > >
> > > > vmstat
> > > procs memory page disk faults
> > > cpu
> > > r b w swap free re mf pi po fr de sr f0 s0 s1 s2 in sy
> cs us
> > > sy id
> > > 0 0 0 21872 1928 1 121 244 190 252 0 159 0 0 65 1 181 604
> 233
> > > 15 7 78
> > > >
> > >
> > > # df -k
> > > Filesystem kbytes used avail capacity Mounted on
> > > /proc 0 0 0 0% /proc
> > > /dev/dsk/c1t0d0s0 122863 39590 70987 36% /
> > > /dev/dsk/c1t0d0s3 492065 407387 35472 92% /usr
> > > fd 0 0 0 0% /dev/fd
> > > /dev/dsk/c1t0d0s4 122863 18089 92488 17% /var
> > > swap 266272 51408 214864 20% /tmp
> > > /dev/dsk/c1t0d0s7 1529888 99355 1369338 7% /data
> > > /dev/dsk/c1t0d0s5 193416 107377 66698 62% /home
> > > /dev/dsk/c1t0d0s6 1553912 922123 569633 62% /opt
> > > /dev/dsk/c1t1d0s2 10606786 7958903 2541816 76% /data1
> > > /dev/dsk/c1t1d0s4 21218376 10678244 10327949 51% /data2
> > > /dev/dsk/c1t1d0s6 3462896 9 3428259 1% /data3
> > >
> > > load averages: 0.25, 0.30,
> > > 0.30
> > > 15:10:03
> > > 56 processes: 54 sleeping, 1 zombie, 1 on cpu
> > > CPU states: 67.3% idle, 13.9% user, 8.2% kernel, 10.6% iowait,
> 0.0%
> > > swap
> > > Memory: 64M real, 1804K free, 79M swap in use, 211M swap free
> > >
> > > PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
> > > 275 root 1 59 0 1580K 496K sleep 993:44 8.70% inge
> > > 372 root 1 59 0 8060K 644K sleep 127:23 1.17% Xsun
> > > 320 ldm 1 58 0 777M 18M sleep 106:30 0.78% pqing
> > > 9285 rleche 1 58 0 1788K 996K cpu 0:00 0.31% top
> > > 321 ldm 1 58 0 777M 16M sleep 23:33 0.25% pqact
> > > 313 ldm 1 58 0 777M 16M sleep 28:07 0.23%
> pqbinstats
> > > 2130 ldm 1 58 0 777M 16M sleep 25:42 0.21% rpc.ldmd
>
> > > 2232 ldm 1 58 0 777M 16M sleep 23:32 0.19% rpc.ldmd
>
> > > 22673 ldm 1 58 0 777M 16M sleep 4:32 0.16% rpc.ldmd
>
> > > 7404 ldm 1 58 0 777M 16M sleep 0:19 0.12% rpc.ldmd
>
> > > 9298 root 1 100 -20 1972K 1044K sleep 0:00 0.09% inge
> > > 316 ldm 1 59 0 778M 15M sleep 17:25 0.05% pqing
> > > 476 ldm 1 49 0 777M 10M sleep 7:11 0.04% pqutil
> > > 1 root 1 58 0 624K 104K sleep 2:48 0.01% init
> > > 22007 ldm 1 58 0 777M 16M sleep 2:12 0.01% rpc.ldmd
>
> > >
> > > You will note X is running. Running X makes no difference . The sdi
> lock
> > > problem
> > > occurs regardless.
> > >
> > > Other clues:
> > > Lockup occurs regardless of LDM version.
> > > Lockup occurred before current version of sdi ingestor software. ( I
>
> > > will have to
> > > research versions of sdi).
> > > Failure happens about every 4 to 5 days if left undisturbed.
> > >
> > > Anne, thanks for the help
> > >
> > > Bob
> > >
> > > Anne Wilson wrote:
> > >
> > > > Robert Leche wrote:
> > > > >
> > > > > Hello Anne,
> > > > > I am sorry about the delay in getting back to you, We where in
> the throws of
> > > > > rebuilding our main file server, email server.
> > > > >
> > > > > To answer your question of the server name, the answer is yes,
> or all the
> > > > > above. Pardon the small joke.
> > > > >
> > > > > The intention is to transition nafhat.srcc.lsu.edu to
> sirocco.srcc.lsu.edu
> > > > > next week. Sirocco will continue to be LSU's ldm server to other
> ldm clients.
> > > > > Old Sirocco ( A Sun box) will be retired and decommissioned. A
> notice to the
> > > > > user group will be issued a day before the cut over.
> Tentatively, Wednesday
> > > > > is the cut over date, so look for the notice starting Tuesday.
> > > > >
> > > > > The Radar issue remains to be sorted out. I hope to have
> solutions in place
> > > > > by the time the encryption is removed (~Jan 14 from what I am
> reading on the
> > > > > ldm-user group).
> > > > >
> > > > > We have another important issue. I am not sure you are the
> correct point of
> > > > > contact for the NOAAport system. If you are, I need to report
> our NOAA port
> > > > > SUNOs system must be restarted every 4-5 days or the ingestion
> process dies
> > > > > and only a reboot restores operation when the sdi locks up. I
> want to be
> > > > > able to depend on this machine though thick and thin and as it
> is, I can not
> > > > > depend on it.
> > > > >
> > > > > Bob
> > > > >
> > > >
> > > > Hi Bob,
> > > >
> > > > I'm not sure what you mean by "the radar issue". Is it a question
> of
> > > > which machine you want to use to serve the radar data? Currently
> the
> > > > two sites that were supposed to get radar data from you are
> feeding from
> > > > our server, motherlode. This arrangement was intended to be
> temporary,
> > > > but so far it's working fine and we can keep it up for a little
> while.
> > > >
> > > > And, the "published" date that the unencrypted, compressed data
> will be
> > > > available is January 10, although I suppose the possibility exists
> that
> > > > that could change yet again. Maybe you're thinking of January 14
> > > > because that's when AMS starts and people are thinking that NOAA
> wants
> > > > to show off the radar distribution at the conference.
> > > >
> > > > Although I don't know much about the NOAAPORT system, I talked
> with Robb
> > > > and Mike (our sys admin) about it. They said you should make sure
> you
> > > > enough disk space and RAM. (The 'top' command can tell you about
> the
> > > > RAM usage. If you don't have that command, vmstat also gives
> > > > statistics.)
> > > >
> > > > If you want someone to take a look at your machine, Robb is
> willing to
> > > > do so. I've cc'ed him on this message - you can continue this
> thread
> > > > with him.
> > > >
> > > > Thanks for responding!
> > > >
> > > > Anne
> > > > ***************************************************
> > > > Anne Wilson UCAR Unidata Program
> > > > address@hidden P.O. Box 3000
> > > > Boulder, CO 80307
> > > > ----------------------------------------------------
> > > > Unidata WWW server http://www.unidata.ucar.edu/
> > > > ****************************************************
> >
> >
>
> ===============================================================================
>
> Robb Kambic Unidata Program Center
> Software Engineer III Univ. Corp for Atmospheric
> Research
> address@hidden WWW:
> http://www.unidata.ucar.edu/
> ===============================================================================
>
>
>
>
>
===============================================================================
Robb Kambic Unidata Program Center
Software Engineer III Univ. Corp for Atmospheric Research
address@hidden WWW: http://www.unidata.ucar.edu/
===============================================================================