Kevin,
Thanks for the info. It looks like latencies from our conduit
machines went up early this morning ~11:00 UTC, but this time for
all clients. And that's right, this one was definitely an NCEP
problem. NCEP had production issues this morning, impacting the
arrival of data onto our conduit machines, and affecting the
latency.
I'm still looking into all of this, though. And for everyone's
record, the latency for gfs.t12z.pgrb2.0p25.f096 over the weekend
looked pretty good, all way under a minute:
11/14
Wisconsin: 0 seconds
Unidata/UCAR: 1 second
UIUC: 15 seconds
PSU: 6 seconds
11/15:
Wisconsin: 8 seconds
Unidata/UCAR: 4 seconds
UIUC: 22 seconds
PSU: 2 seconds
Thank you to Art for sending the traceroute. Does anyone recall
when this latency problem started, or got worse, and how often it
seems to happen?
Mike
On 11/16/2015 08:20 AM, Tyle, Kevin R
wrote:
More latency noted overnight, courtesy of Kyle Griffin @
UWisc-Madison:
-----------------------------------------------------------------------------------
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+idd.aos.wisc.edu
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+flood.atmos.uiuc.edu
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+idd.meteo.psu.edu
And noted nicely downstream. This smells like an NCEP
problem, as UWisc and UIUC have the same (~1800 second)
latency and others are >2000 (PSU, Albany).
Comparing Albany and UWisc, the GFS files that are
short (some are more than 30% missing, one as much as 70%
missing) are the same, but the file sizes are not the
same, implying downstream servers were getting some
slightly different sets of data from their upstream
options.
Just wanted to send this out in case either of you had
a couple minutes in your busy Monday mornings to check
this out...might be getting to be an annoying problem to
try and chase...
Kyle
----------------------------------------
Kyle S. Griffin
Department of Atmospheric and Oceanic Sciences
University of Wisconsin - Madison
Room 1407
1225 W Dayton St, Madison, WI 53706
_____________________________________________
Kevin Tyle, Systems Administrator
Dept. of Atmospheric & Environmental Sciences
University at Albany
Earth Science 235, 1400 Washington
Avenue
Albany, NY 12222
Email: address@hidden
Phone: 518-442-4578
_____________________________________________
From:
Michael Shedlock <address@hidden>
Sent: Friday, November 13, 2015 2:53 PM
To: Mike Dross; Arthur A Person
Cc: Bentley, Alicia M; _NCEP.List.pmb-dataflow;
Michael Schmidt; address@hidden; Daes
Support
Subject: Re: [conduit] [Ncep.list.pmb-dataflow]
How's your GFS?
All,
NCEP is indeed on internet2, which I presume would apply
here.
A couple of noteworthy things.... I see some latency, but
not for everyone, and it doesn't seem to matter which
conduit machine a client is connected to. For example, with
today's and yesterday's gfs.t12z.pgrb2.0p25.f096 (hour 96)
file here are the latencies per client that I see:
11/12
Wisconsin: A few seconds
Unidata/UCAR: A few seconds
UIUC: 13 minutes
PSU: 27 minutes
11/13:
Wisconsin: A few seconds
Unidata/UCAR: A few seconds
UIUC: 2.33 minutes
PSU: 2.75 minutes
Another correlation is that UIUC and PSU (the ones with
latency) are only using one thread to connect to our
conduit, whereas Wisc. and Unidata use multiple threads.
At the moment this sort of has the appearance of a
bottleneck outside of NCEP. It might also be useful to see
traceroutes from UIUC and PSU to NCEP's CONDUIT. I know I
saw some traceroutes below. Can you try that and share with
us?
Mike Shedlock
NCEP Central Operations
Dataflow Team
301.683.3834
On 11/13/2015 11:42 AM, Mike
Dross wrote:
My $ 0.02 from having works with LDM since the mid
90's.
I assume NCEP is not on
internet2? If so bandwidth shouldn't be an issue.
Regardless I would check the traceroutes to ensure a
good path, high bandwidth, low latency. Basic network
topology check. I am sure you have done this.
An iperf test is a simple way
to test the maximum throughput to see if bandwidth is an
issue. If that's not it, high latency or the way LDM is
set up on the upstream side is likely the culprit.
Mike
Sent from my iPad
Carissa,
Yes, still issues. There was a period several
weeks ago when throughput was clean, but recently
we've seen delays to varying degrees.
we've seen delays during 0.25 degree gfs
transmission that range from 500 seconds to 3500
seconds over the past couple of days.
Also, comparison
with charts from other schools seem to show
better reception when feeding from "conduit1"
rather than "conduit2".
Does this mean
anything to you or is it purely coincidence or
incidental?
Thanks for any insights you can provide.
Art
Art,
I am going to add our team to this thread.
Are you still seeing issues. Is so we will
take a look and see if we can tell if
anything on our side is happening around FH
96.
--
Arthur A. Person
Research Assistant, System Administrator
Penn State Department of Meteorology
email: address@hidden,
phone: 814-863-1563
_______________________________________________
Ncep.list.pmb-dataflow mailing list
address@hidden
https://www.lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.pmb-dataflow
|