[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[IDD #ZAJ-969398]: IDD Data is too old
- Subject: [IDD #ZAJ-969398]: IDD Data is too old
- Date: Tue, 06 Nov 2018 14:42:01 -0700
Hi Elliott,
re:
> The hostname in the registry is "irads-ingest0.net.ou.edu".
OK, that is what we thought.
re:
> In the registry, we have 'time-offset' lowered to 600 seconds. Even with that
> low of a
> period to backfill, it never catches up. In fact, it gets further behind.
OK.
re:
> Yesterday evening, I tested idd.meteo.psu.edu. The same issue occurred. We
> were able to
> get a bit better performance by splitting the request into two, but it still
> doesn't
> perform well.
Can you let us know when you split the feed REQUEST into two?
I'm trying to understand the latency trace that we see in:
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NEXRAD3+irads-ingest0.net.ou.edu
This time series shows that the NEXRAD3 latency grew to over 3600 seconds
sometime between
0 and 1 UTC yesterday, November 5. It dropped to near zero sometime around 2
UTC today,
stayed low up until around 14 UTC today and then started getting worse. For
some reason,
there doesn't seem to be any latency information for the period of approx. 15
to 19 UTC
today, and then latency values returned, but they are bouncing between high and
low
values. This last behavior makes me think that your LDM configuration file has
the
same feed REQUEST to both PSU and to somewhere else, and that somewhere else has
content that is totally different than what is available from PSU. What I am
looking
at is the light pink (or some such color) near zero latency line that can be
seen
in the period from 15 to 19 UTC.
Can you send us your LDM configuration file so we can take a look? If you'd
rather
not send us the entire file (as an attachment), please send the output of:
grep -i ^request ~ldm/etc/ldmd.conf
A potential problem I am trying to figure out is one caused by replicated feed
REQUESTs
to different upstreams that have different contents for the datastream(s) being
REQUESTed. When a situation like this is present, the LDM will preferentially
get
products from one server over the other, and that will, in turn, mean that one
would not get the products desired from one of the upstreams. The suspicious
latencies
are the ones from kfs-mini-01.mesonet.
re:
> While we've been focused on one host, irads-ingest0, at 156.110.246.56, we
> are seeing the
> same issue from echo-ingestA.services.ou.edu (129.15.2.32). It has been
> tested against
> idd.unidata.ucar.edu and idd.aos.wisc.edu and behaves the same as
> irads-ingest0.
The latency plot for NEXRAD3 on echo-ingesta.services.ou.edu is showing the
exact
same kind of thing as irads-ingest0, and the suspicious latency in the plot is
also the
one from the connection to kfs-mini-01.mesonet.
Can you send us the LDM configuration file from echo-ingesta?
re:
> To add
> to the issue, both hosts have started having issues receiving the NL2 TDS
> feed as well.
>
> http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NEXRAD3+echo-ingesta.services.ou.edu
> http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NEXRAD2+echo-ingesta.services.ou.edu
The latency plot for NEXRAD2 for echo-ingesta is even stranger since some of the
sites are coming in with very low latencies while a slug of others are showing
very high latencies.
re:
> The two hosts are in separate networks on separate campuses, managed by
> different parts of
> the IT department. With that, we have put in a ticket with our ISP, OneNet.
> They have
> requested traceroutes from our upstream sources to our hosts, if possible, to
> trace the
> data path.
Another clue: two machines in the NWC (ldmingest01.nwc.ou.edu and
ldmingest02.nwc.ou.edu)
are also showing higher than previous latencies for CONDUIT starting sometime on
Sunday morning, November 4:
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+ldmingest01.nwc.ou.edu
Is the NWC in the same building as you?
Do you know who to contact as the LDM administrator of the NWC machines? We
really
want to find the contact since we are seeing LDM connections from their machines
breaking and reestablishing connections on more than one backend real-server
machines of our idd.unidata.ucar.edu cluster.
re:
> Would it be possible to get traceroutes from a host behind
> idd.unidata.ucar.edu to
> 129.15.2.32 and 156.110.246.56?
Here are traceroutes from the real server backend machine that have connections
from each of these IPs:
From uni19.unidata.ucar.edu which is servicing a split NEXRAD3 and an NGRID feed
to echo-ingesta.services.ou.edu/129.15.2.32:
traceroute to 129.15.2.32 (129.15.2.32), 30 hops max, 60 byte packets
1 flr-n140.unidata.ucar.edu (128.117.140.251) 0.724 ms 0.715 ms 0.854 ms
2 ml2core-fl2core.unet.ucar.edu (128.117.243.194) 0.958 ms 1.449 ms 1.442
ms
3 corel3-ml2core-i2.unet.ucar.edu (128.117.243.141) 1.696 ms 1.692 ms
1.680 ms
4 v3454.rtr-chic.frgp.net (192.43.217.222) 23.159 ms 23.166 ms 23.160 ms
5 et-2-1-0.4079.rtsw.chic.net.internet2.edu (162.252.70.116) 23.742 ms
23.911 ms 23.729 ms
6 ae-3.4079.rtsw.kans.net.internet2.edu (162.252.70.141) 34.575 ms 34.530
ms 34.511 ms
7 et-7-0-0.4079.rtsw.tuls.net.internet2.edu (162.252.70.35) 38.497 ms
38.478 ms 38.876 ms
8 198.71.46.45 (198.71.46.45) 38.520 ms 38.519 ms 38.505 ms
9 164.58.244.44 (164.58.244.44) 38.564 ms 38.827 ms 38.793 ms
10 164.58.244.15 (164.58.244.15) 40.757 ms 40.768 ms 40.755 ms
11 164.58.245.55 (164.58.245.55) 40.836 ms 164.58.245.53 (164.58.245.53)
40.924 ms 40.707 ms
12 164.58.245.58 (164.58.245.58) 40.909 ms 40.907 ms 164.58.245.56
(164.58.245.56) 40.851 ms
13 164.58.244.33 (164.58.244.33) 41.414 ms 41.347 ms 41.386 ms
14 164.58.10.98 (164.58.10.98) 41.467 ms 41.463 ms 41.512 ms
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *
From uni19.unidata.ucar.edu which to delta-ingest0.irads.ou.edu/156.110.246.56:
traceroute to 156.110.246.56 (156.110.246.56), 30 hops max, 60 byte packets
1 flr-n140.unidata.ucar.edu (128.117.140.251) 0.743 ms 1.081 ms 1.155 ms
2 ml1core-flacore.unet.ucar.edu (128.117.243.78) 1.249 ms
fl2core-flacore.unet.ucar.edu (128.117.243.106) 0.839 ms
ml1core-flacore.unet.ucar.edu (128.117.243.78) 1.230 ms
3 ml2core-ml1core.unet.ucar.edu (128.117.243.99) 1.094 ms
ml2core-fl2core.unet.ucar.edu (128.117.243.194) 1.374 ms
ml2core-ml1core.unet.ucar.edu (128.117.243.99) 1.353 ms
4 corel3-ml2core-i2.unet.ucar.edu (128.117.243.141) 1.504 ms 1.499 ms
1.487 ms
5 v3454.rtr-chic.frgp.net (192.43.217.222) 23.160 ms 23.134 ms 23.136 ms
6 et-2-1-0.4079.rtsw.chic.net.internet2.edu (162.252.70.116) 23.471 ms
23.499 ms 23.592 ms
7 ae-3.4079.rtsw.kans.net.internet2.edu (162.252.70.141) 34.663 ms 34.482
ms 34.481 ms
8 et-7-0-0.4079.rtsw.tuls.net.internet2.edu (162.252.70.35) 38.407 ms
38.386 ms 38.270 ms
9 198.71.46.45 (198.71.46.45) 38.521 ms 38.490 ms 38.478 ms
10 164.58.244.44 (164.58.244.44) 38.839 ms 164.58.244.46 (164.58.244.46)
38.513 ms 164.58.244.44 (164.58.244.44) 38.588 ms
11 164.58.244.239 (164.58.244.239) 38.676 ms 38.644 ms 38.643 ms
12 164.58.16.26 (164.58.16.26) 38.481 ms 38.444 ms 38.428 ms
13 156.110.254.97 (156.110.254.97) 41.410 ms 41.195 ms 41.521 ms
14 156.110.254.62 (156.110.254.62) 41.003 ms 40.885 ms 156.110.254.50
(156.110.254.50) 45.609 ms
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *
NOTE:
- While on uni19.unidata.ucar.edu, I saw that it was feeding lion.caps.ou.edu
some CONDUIT data, and all of UNIWISC, NOTHER and NGRID.
- the latency plots for NGRID and NOTHER for lion look much the same as
for your machine and for the NWC machines:
NOTHER
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NOTHER+lion.caps.ou.edu
NGRID
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NGRID+lion.caps.ou.edu
- also, it is a bit strange that there are two different names associated with
the 156.110.246.56 IP address:
~: nslookup 156.110.246.56
Server: 208.67.222.222
Address: 208.67.222.222#53
Non-authoritative answer:
56.246.110.156.in-addr.arpa name = delta-ingest0.irads.ou.edu.
56.246.110.156.in-addr.arpa name = delta-ingest0.ou.edu.
This is a bit strange, but it should have no bearing on the latency issues
being investigated.
- lastly, I don't see any connections from
delta-ingest0.irads.ou.edu/156.110.246.56
on any of the real server backend machines that comprise the
idd.unidata.ucar.edu
cluster.
Cheers,
Tom
--
****************************************************************************
Unidata User Support UCAR Unidata Program
(303) 497-8642 P.O. Box 3000
address@hidden Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage http://www.unidata.ucar.edu
****************************************************************************
Ticket Details
===================
Ticket ID: ZAJ-969398
Department: Support IDD
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata
inquiry tracking system and then made publicly available through the web. If
you do not want to have your interactions made available in this way, you must
let us know in each email you send to us.