Wait a minute here,
128.117.140.208 isn't in the mix. The other hosts are.
I updated the LDM access list. Should we just have some class
C ranges
to have access rather then ip at a time?
Also, i noticed that the send-Q are pretty normal on ldm2
server right
now but was pretty high on ldm1. might be just an issue with
the ACL
list.
128.117.12.2
128.117.12.3
128.117.130.220
128.117.140.208
128.117.140.220
128.117.149.220
128.117.156.220
128.174.80.16
128.174.80.47
140.90.193.19
140.90.193.227
140.90.193.228
140.90.193.99
140.90.226.201
140.90.226.202
140.90.226.203
140.90.226.204
140.90.37.12
140.90.37.13
140.90.37.15
140.90.37.16
140.90.37.40
144.92.130.88
144.92.131.244
150.9.117.128
192.12.209.57
192.58.3.194
192.58.3.195
192.58.3.196
192.58.3.197
193.61.196.74
198.181.231.53
208.64.117.128
Justin Cooke wrote:
Chi,
The reboot doesn't seem to have helped. Is there anything else
that may
be causing these issues? Network related after I performed the
restart
of LDM? Steve has a few possibilities:
/It seems to be network related at your end, but strange that it
occurred at the time when you retsrtaed the LDM- unless there
was some
sort of firewall or packet filter that occurred when the LDM's
re-connected. /
Justin
Steve Chiswell wrote:
Justin,
I haven't seen any improvement from ncepldm to the top level
relays
daffy.unidata.ucar.edu (Unidata), idd.aos.wisc.edu (U.
WIsconsin),
flood.atmos.uiuc.edu (U. Illinois) or atm.cise-nsf.gov (NSF,
DC).
It seems to be network related at your end, but strange that
it occurred
at the time when you retsrtaed the LDM- unless there was some
sort of
firewall or packet filter that occurred when the LDM's re-
connected.
Thanks for your time in looking at this,
Steve
On Fri, 2007-06-15 at 15:31 -0400, Justin Cooke wrote:
Steve and Doug,
I just got a call from Chi at the WOC, he rebooted LDM1
after noticing
an unusual load on the machine. LDM is again running on that
box and it
remains primary, can you check to see how the latencies are
now?
Thanks,
Justin
Doug Schuster wrote:
Justin,
28,079 products are missing from the 12z cycle. You'll be
getting the
automated email shortly.
-Doug
On Jun 15, 2007, at 12:48 PM, Justin Cooke wrote:
Steve,
I've turned off the feed to LDM2.
There is no other load on the ldm1 system except for LDM.
Doug, are you missing many of the TIGGE params for 12Z?
Justin
Steve Chiswell wrote:
Justin,
That didn't change the behavior. Still seeing latency.
perhaps turning off the other feed. Is there any load
other than LDM on the system?
Steve
On Fri, 2007-06-15 at 12:56 -0400, Justin Cooke wrote:
Steve,
I've recreated the queue, let me know if you are still
seeing issues.
If so I'll turn off the feed to ldm2 to see if that
corrects things.
Justin
Steve Chiswell wrote:
Justin,
I don't know if they saw a disk space problem with
log files not being rotated, but it might just be
best today to build a new queue:
ldmadmin stop
ldmadmin delqueue
ldmadmin mkqueue
ldmadmin start
That will mean some queued data would be lost, but if
users aren't
getting it
anyway, then its best to ensure that the queue isn't
corrupt for the
weekend.
Happy Friday....
Thanks,
Steve
On Fri, 2007-06-15 at 12:13 -0400, Justin Cooke wrote:
Steve,
Our logs on the primary ldm system "ldm1" had not
rotated for
nearly a week. I sent email to the WOC support and
this was the
response:
Looks like the seed file was missing after we brought
the system
backup
from the last outage. should be good now.
Justin Cooke wrote:
WOC,
I noticed that our logs for LDM have not been rotated
on machine
ldm1
since 06/05/2007. We have a cron entry that runs
"ldmadmin
newlog" at
00Z every day.
I attempted to run the command by hand and got the
following back:
ldm@ldm1:~$ bin/ldmadmin newlog
hupsyslog: couldn't open /var/run/syslogd.pid
I checked but /var/run/syslogd.pid is not there but
it is on ldm2.
Could there be a problem with syslogd on ldm1?
Justin
Also around that time I turned on our backup feed to
the ldm2
system which had been off since that system had issues
a few
weeks ago (we were asked by WOC to turn it back on). I
have sent
email to their support group asking if both ldm1 and
ldm2 are
responding to the ncepldm.woc.noaa.gov address or if
something
else is going on.
Justin
Steve Chiswell wrote:
Justin,
Yesterday just after 18Z, the data flow from
ncepldm.woc.noaa.gov
to top level sites at NSF and Unidata both began
showing high
latency:
http://www.unidata.ucar.edu/cgi-bin/rtstats/
iddstats_nc?CONDUIT
+atm.cise-nsf.gov
and
http://www.unidata.ucar.edu/cgi-bin/rtstats/
iddstats_nc?CONDUIT
+daffy.unidata.ucar.edu
Data volume out has dropped as a result:
http://www.unidata.ucar.edu/cgi-bin/rtstats/
iddstats_vol_nc?CONDUIT
+atm.cise-nsf.gov
Since the behavior is similar at both sites at separate
locations, the
problem would appear to be near your end. Since that
coincides
with your
restart of the LDM, could you fill me in on the
issues you were
experiencing?
Thanks
Steve Chiswell
Unidata User Support
On Fri, 2007-06-15 at 11:38 -0400, Justin Cooke wrote:
Doug,
I had to restart our LDM yesterday right before the
18Z cycle,
we had an issue with out logging but none of the
configuration
files changed. Could one of your feeds have lost the
connection
to our LDM during that restart?
Justin
Douglas Schuster wrote:
Yes, we've received partial cycles. More than
half of the
expected fields have been missing
in each cycle from June 14 18Z, to June 15, 06Z.
The number
of missing fields varies between
each cycle.
Doug
On Jun 15, 2007, at 9:11 AM, Justin Cooke wrote:
Doug,
Have you received any GEFS data from us today? Or
is it just
certain fields you are missing?
Justin
--
Chi Y. Kang
Contractor
Principal Engineer
Phone: 301-713-3333 x201
Cell: 240-338-1059