[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #QIJ-760583]: IDD - NEXRAD2 feed appears down from idd.unidata.ucar.edu



Hi John,

> Institution: Museum of Science, Boston MA
> Inquiry: Looks like the NEXRAD2 (CRAFT) feed has been down
> for about 24 hours.  Last files received were around Feb. 22 08Z.
> Our upstream host is idd.unidata.ucar.edu.  Confirmed with
> notifyme -o 10000.  Our LDM host is 'wxserver1.mos.org'.

Please forgive the verbosity of this email, but I am using it to document
some of the problems that are occurring at the moment, one or more of which
may be affecting your ability to receive the NEXRAD Level II data.

Since the toplevel IDD relay node, idd.unidata.ucar.edu, has been receiving
the NEXRAD2 data throughout the past 24 hours as evidenced by:

http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc?NEXRAD2+uni1.unidata.ucar.edu

I would assume that the problem is most likely related to routing.  We had
routing problems to west coast sites starting on Monday and continuing until
the Colorado Front Range GigaPoP (FRGP) shut off their connection the the 
National
Light Rail (NLR) network.  The information we received from network 
administrators
talked about this outage:

  UNSCHEDULED OUTAGE at FRGP on 2/22/2006

  DATE/TIME: 2/22/2006 - 1115 AM
  DURATION: 4+ hours

  REASON:
  NLR is having issues reaching many sites on the west coast and beyond, so NCAR
  has shut off NCAR's BGP session to the FRGP NLR router l3-gw-1. UCDHSC might
  consider shutting their BGP down also if they're seeing similar problems. NOAA
  is seeing minor problems too.

Given that the NLR routing problems were reported to only be affecting west
coast (and beyond) sites, I would assume that this is not the problem you
are experiencing, but you never know.

At the same time, one of the toplevel relays for the NEXRAD2 data, IRaDS (OU),
has been reporting on severe networking problems in the eastern US.  Here
is a representative message:

  Date:    Thu, 23 Feb 2006 07:25:16 CST
  To:      Unidata <address@hidden>
  From:    <address@hidden>
  Subject: IRaDS STATUS ALERT ... Eastern Region Latencies

  Quick update -- no news on the Eastern Region latencies -- the data was up and
  down last night and the latencies continued to increase -- we'll try to get 
some
  information on what is going on...

  -----------------------------------------------------
                     Integrated Radar Data Services
                       address@hidden
          http://www.radarservices.org   1-866-33IRADS

Finally, I just logged into the idd.unidata.ucar.edu cluster node that
is attempting to send data to your machine, wxserver1.mos.org, and I see
continuous errors while attempting to send NEXRAD2 data.  Here are three
of the most recent entries from ~ldm/logs/ldmd.log (on uni2.unidata.ucar.edu):

Feb 23 07:31:55 uni2 wxserver1(feed)[814]: up6.c:287: nullproc_6() failure to 
wxserver1.mos.org: RPC: Unable to receive; errno = Connection timed out
Feb 23 07:31:56 uni2 wxserver1(feed)[818]: up6.c:287: nullproc_6() failure to 
wxserver1.mos.org: RPC: Unable to receive; errno = Connection timed out
Feb 23 07:31:57 uni2 wxserver1(feed)[819]: up6.c:287: nullproc_6() failure to 
wxserver1.mos.org: RPC: Unable to receive; errno = Connection timed out

This kind of error looks suspiciously like the ones we saw for San Jose State
University yesterday morning and the day before.  We learned that the error
to SJSU was being caused by the NLR routing problems since they went away
as soon as the NLR connection at the FRGP was turned off.  It could be
the case that the NLR connection has bee turned back on (with the assumption
that the NLR routing problems have been solved) and there are now problems 
sending
data to the Eastern US.  Or, the errors may be caused by the same networking
problems that IRaDS is reporting for MCI.  One of the steps we took on Tuesday
evening to try and help SJSU was to hardwire the IP<->name information for
their machine in the /etc/hosts file on each of our cluster nodes.  Since this
seemed to help (at least for 12 hours), I have done the same thing for
your machine.

Please try your notifyme to idd.unidata.ucar.edu again and let us know the 
results:

<as 'ldm'>
notifyme -vxl- -f NEXRAD2 -o 10000 -h idd.unidata.ucar.edu

I tried to do the equivalent to your machine from one of the cluster nodes, but
was not allowed:

[ldm@uni4 ~]$ notifyme -vxl- -f NEXRAD2 -h wxserver1.mos.org
Feb 23 14:40:36 notifyme[26303]: Starting Up: wxserver1.mos.org: 
20060223144036.449 TS_ENDT {{NEXRAD2,  ".*"}}
Feb 23 14:40:36 notifyme[26303]: Connected to upstream LDM-5
Feb 23 14:40:38 notifyme[26303]: NOTIFYME(wxserver1.mos.org): 7: Access denied 
by remote server

ASIDE:  I note that your NEXRAD2 data request is for a limited number of NEXRADs
(from ~ldm/logs/ldmd.log on uni2.unidata.ucar.edu):

Feb 23 07:16:06 uni2 wxserver1(feed)[10550]: up6.c:334: Starting Up(6.3.0/6): 2
0060223061605.339 TS_ENDT {{NEXRAD2,  "(KBOX|KOKX|KGYX|KCXX|KENX|KTYX|KBGM|KDIX
)"}}

and the NWS is reporting latency issues with one of these stations:

  Date: Thu, 23 Feb 2006 10:38:32 +0000
  From: "toc.nwstg" <address@hidden>

  The NWS TOC contacted personnel at the following WSR-88D site to inform
  them of the WSR-88D Level II Data Latency at 6:00 AM on 02/23/2006:

  KDIX - Mount Holly, PA
  KARX - La Crosse, WI
  KJKL - Jackson, KY
  KEVX - Red Bay, FL
  KMLB - Melbourne, FL


Cheers,

Tom
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: QIJ-760583
Department: Support IDD
Priority: Normal
Status: Open