[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Data outage
- Subject: Re: Data outage
- Date: Mon, 19 Mar 2001 15:03:30 -0700
Charles O'Brien wrote:
>
> Anne,
>
> To recap...
> Over the past weeks, we at WSI have been doing some DNS changing.
> However, it only affected our "other" network (not the one on
> the LDM Data feed). However, for some reason there were a handful
> of clients that could not get our data. So, for those clients I
> had them add a few more allow/accept lines in their ldmd.conf
> (add sysu1.wsicorp.com, 198.115.158.1 as well as sysu1.uni.wsicorp.com).
> For the other clients, it worked. Not with Purdue.
>
> We did not drop their account. It is currently deactivated because
> the errors and nullprocs going on was killing our system.
>
> At some point we got LDMPING to work. Then because I thought
> ANVIL was having DNS issues, I had Eric put my addresses in his
> /etc/hosts file. This did not fix it. It actually made it worse.
>
> Traceroute/nslookup works fine:
>
> traceroute anvil.eas.purdue.edu
> traceroute to anvil.eas.purdue.edu (128.210.168.99), 30 hops max, 40 byte
> packets
> 1 rt-wsi-bbn (198.115.158.249) 3 ms 2 ms 2 ms
> 2 s3-0-0-22.cambridge1-cr20.bbnplanet.net (4.1.134.229) 6 ms 5 ms 5 ms
> 3 p2-1.cambridge1-nbr1.bbnplanet.net (4.0.1.153) 5 ms 5 ms 5 ms
> 4 p3-0.cambridge1-nbr2.bbnplanet.net (4.0.5.18) 9 ms 5 ms 5 ms
> 5 p4-0.bstnma1-br1.bbnplanet.net (4.0.5.157) 6 ms 6 ms 6 ms
> 6 p9-0.nycmny1-nbr2.bbnplanet.net (4.24.6.50) 12 ms 12 ms 12 ms
> 7 p1-0.nycmny1-br2.bbnplanet.net (4.24.10.86) 12 ms 12 ms 12 ms
> 8 p4-0.nycmny1-br1.bbnplanet.net (4.24.6.225) 12 ms 12 ms 12 ms
> 9 p1-0.nycmny1-ba1.bbnplanet.net (4.24.6.230) 12 ms 12 ms 12 ms
> 10 a1-0.xnycmny4-uunet.bbnplanet.net (4.0.6.142) 12 ms 14 ms 24 ms
> 11 0.at-6-0-0.XL2.NYC9.ALTER.NET (152.63.18.226) 13 ms 18 ms 14 ms
> 12 0.so-7-0-0.XR1.NYC9.ALTER.NET (152.63.23.138) 12 ms 12 ms 12 ms
> 13 0.so-3-0-0.TR1.NYC9.ALTER.NET (152.63.22.98) 12 ms 12 ms 13 ms
> 14 125.at-5-0-0.TR1.CHI2.ALTER.NET (152.63.1.45) 43 ms 43 ms 43 ms
> 15 197.at-5-0-0.XR1.CHI4.ALTER.NET (152.63.65.49) 44 ms 44 ms 45 ms
> 16 195.ATM11-0-0.GW1.IND1.ALTER.NET (146.188.208.169) 48 ms 47 ms 52 ms
> 17 157.130.101.106 (157.130.101.106) 54 ms 70 ms 75 ms
> 18 cisco2-242.tcom.purdue.edu (128.210.242.7) 108 ms 73 ms 78 ms
> 19 anvil.eas.purdue.edu (128.210.168.99) 90 ms 81 ms 88 ms
>
> nslookup anvil.eas.purdue.edu
> Server: 127.0.0.1
> Address: 127.0.0.1#53
>
> Non-authorative answer:
> Name: anvil.eas.purdue.edu
> Address: 128.210.168.99
>
> nslookup 128.210.168.99
> Server: 127.0.0.1
> Address: 127.0.0.1#53
>
> Non-authorative answer:
> 99.168.210.128.in-addr.arpa name = anvil.eas.purdue.edu.
>
> Authoritative answers can be found from:
> 210.128.in-addr.arpa nameserver = ns2.purdue.edu.
> 210.128.in-addr.arpa nameserver = pendragon.cs.purdue.edu.
> 210.128.in-addr.arpa nameserver = harbor.ecn.purdue.edu.
> 210.128.in-addr.arpa nameserver = ns.purdue.edu.
> ns.purdue.edu internet address = 128.210.11.5
> ns2.purdue.edu internet address = 128.210.11.57
> pendragon.cs.purdue.edu internet address = 128.10.2.5
> harbor.ecn.purdue.edu internet address = 128.46.154.76
>
> ldmping -h anvil.eas.purdue.edu. -l - -v
> Mar 19 21:03:06 State Elapsed Port Remote_Host rpc_stat
> Mar 19 21:03:07 ADDRESSED 0.200509 0 anvil.eas.purdue.edu. RPC:
> Unable to receive; errno = Connection reset by peer
> Mar 19 21:03:32 SVC_UNAVAIL 0.239751 0 anvil.eas.purdue.edu. RPC:
> Unable to receive; errno = Connection reset by peer
> Mar 19 21:03:57 SVC_UNAVAIL 0.291509 0 anvil.eas.purdue.edu. RPC:
> Unable to receive; errno = Connection reset by peer
>
> Eric, for grins, could you reboot ANVIL? That could be all it needs.
>
> Charlie
>
> ============================================================================
> Charles O'Brien WSI Corporation
> Software Engineer/Meteorologist 4 Federal Street
> EMAIL: address@hidden Billerica, MA 01821
> PHONE: (978) 670-5152 FAX: (978) 670-5100
> ============================================================================
Thanks for the info, Charlie. That was helpful. We did a bit of
testing here using rpcinfo. In particular, we did 'rpcinfo -T tcp
anvil.eas.purdue.edu 300029 5' from both a unidata host, and also
another non-unidata host. (This uses tcp to do a RPC nullproc to
program 300029 version 5 on anvil, i.e., the LDM.)
From this we were able to confirm that we can execute and an LDM
nullproc from unidata (recall that Unidata hosts should have 'allow's on
all LDM sites), but not from the non-unidata machine. From the
non-unidata machine, the results were like Charlie is getting.
This points to two possibilities: a wrong address on the part of WSI, or
a problem with the 'allow' line on anvil.
At WSI: Charlie, regarding the changes to your DNS, could you have the
wrong Perdue address in your /etc/hosts file?
At Perdue: Eric, are you sure your allow line is correct? Perhaps it
would be useful to broaden your allow to "*.wsicorp.com".
Anne
--
***************************************************
Anne Wilson UCAR Unidata Program
address@hidden P.O. Box 3000
Boulder, CO 80307
----------------------------------------------------
Unidata WWW server http://www.unidata.ucar.edu/
****************************************************