[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[IDD #PPJ-526440]: Request for info
- Subject: [IDD #PPJ-526440]: Request for info
- Date: Mon, 08 Jul 2013 13:00:19 -0600
Hi,
This is a follow-up to a phone conversation that Alain and I had on
the morning of Monday, July 8:
re:
> As per our telephone conversation, I would like to request the links for
> real time statistics and any pertinent information to help us
> troubleshoot this problem.
When encountering problems receiving data vi the Unidata LDM/IDD, the
best course of action is:
1) check to see if real-time statistics have been reported by your
machine(s) and are available for display on the Unidata website:
Unidata HomePage
http://www.unidata.ucar.edu
Projects -> Internet Data Distribution
http://www.unidata.ucar.edu/projects/index.html#idd
IDD Current Operational Status
http://www.unidata.ucar.edu/software/idd/rtstats/
Statistics by Host
http://www.unidata.ucar.edu/cgi-bin/rtstats/siteindex
2) the left had column in the siteindex page you will find the
classification of machines reporting real-time statistics by
their domain name
The machine(s) reporting real-time statistics for your domain
will be listed in the right hand column of the siteindex page.
Each machine name entry is a link to a set of information for
that machine
Example:
Domain Hosts
ca.gc.ec.cmc ldm-data.cmc.ec.gc.ca [6.10.1]
ldm-wxo.cmc.ec.gc.ca [6.8.1]
noaaport3.cmc.ec.gc.ca [6.6.4]
noaaport4.cmc.ec.gc.ca [6.6.4]
tigge-ldm.cmc.ec.gc.ca [6.6.4]
3) the page that will be shown when one clicks on the name of the
machine of interest will contain a set of links for each datastream
that is being REQUESTed by that machine
For instance:
https://www.unidata.ucar.edu/cgi-bin/rtstats/siteindex?ldm-data.cmc.ec.gc.ca
Real-time Statistics for ldm-data.cmc.ec.gc.ca [ LDM 6.10.1 ]
FEED NAME
HDS latency log(latency) histogram volume products
topology
IDS|DDPLUS latency log(latency) histogram volume products
topology
NEXRAD2 latency log(latency) histogram volume products
topology
Cumulative volume summary Cumulative volume summary Graph
4) the things to look at when assessing whether the problem being investigated
is a local problem, or one upstream are:
latency - the amount of time between the creation of a product (i.e.,
when
a product is first added to the original LDM queue from which
it
is distributed) and its receipt (i.e., the time that the
product
was received at the local machine)
volume - time series of data volume received for the particular feed
products - time series of the number of products received for the
particular
feed
topology - the route that a product takes from its creation to its
receiption
5) things that can be gleaned from the items listed in 4):
latency - the time history of the latency shows:
- if products are being received in a timely manner
If the latencies are small, the products are being received
with
little delay.
- if there is anything wrong with the system clock on
the receiving machine
A trend in the lowest latencies typically shows that the
clock
on the receiving machine is drifting.
A latency plot where the lowest latency is consistently one
non-zero,
shows that the clock on the receiving machine is either slow
or
fast
NB:
- problems with the local clock should be fixed as soon as possible. If
they are
not, then one may either miss products when the LDM is restarted for any
reason
or no data will be received for some period of time when the LDM is
restarted for
any reason.
- latencies that approach 3600 seconds (one hour) are a warning that there
is some
problem receiving the datastream being REQUESTed. When the latencies
exceed 3600
seconds for an LDM installation that is configured in the "standard"
manner, data
_will_ be lost/not received/thrown away upon receipt. The reason for this
is
the LDM was designed for real-time delivery of data, and one of the working
assumptions is that data that is an hour old is too old to be considered
real-time.
volume - this timeseries shows how much data was received per hour for the
feed
in question
NB:
- LDM REQUESTs for feeds that have high data volumes (e.g., CONDUIT,
NEXRAD2, FNMOC, HRRR)
may need to be split into mutually-exclusive subsets. The feed that has
been typically
split into five subsets is CONDUIT. With the move to dual polarization
full volume
scan radar data, the NEXRAD2 feed has become a candidate for feed REQUEST
splitting.
latency of low-volume feed(s) is acceptably low while the latency for high
volume
feed(s) is unacceptably high
- very low latencies for a feed like IDS|DDPLUS coupled with very high
latencies for
a high volume feed like CONDUIT or NEXRAD2 is a classic indication of
artificial
bandwidth limiting in one or more legs in the network path being taken
during
data delivery. We refer generically to this situation as "packet shaping".
It is our experience that packet shaping is typically done "close" to the
downstream node (i.e., the machine receiving data). The network connection
at/near UCAR/NCAR is never intentionally bandwidth limited, so if there is
a bottleneck somewhere it is most likely not here.
- when an instance of what looks to be packet shaping is discovered, it is
the responsibility of the downstream site to initiate investigations into
where the bottleneck may be. We (Unidata/UCAR) are willing to help in the
investigations and help with resolution of problems, but we typically have
no influence when the problem resides in the downstream's institution.
6) things to try when latencies for one or more feeds are unacceptably high:
- determine if there is any network problems at one's institution
- make sure that the LDM installed on one's machine(s) are functioning
correctly and reasonably up-to-date
- check real-time statistics being reported to us (links above) to make
sure that you really are not receiving the data
This may sound funny, but it is our experience that a number of sites
assume that they are not receiving data when they actually are and
their problem is in processing the data received.
- if a packet shaping signature is seen, try splitting the high volume
feed(s) that are experiencing unacceptably high latencies
- if still having problems after undertaking local investigations, send
an email to:
Unidata IDD Support <address@hidden>
Please do _NOT_ phone individuals in Unidata for help or send email
to Unidata staff member's private email addresses. The reason for
this is that the Unidata staff member may be out of the office
and not able to respond to personal email or voicemail. Email sent
to the address above is reviewed by several Unidata staff throughout
the weekday and routinely on weekends and even holidays, so it is
most likely that help will be provided faster.
As we talked about during our phone conversation this morning, it is
my opinion that:
- the clock on one of your machines, ldm-data.cmc.ec.gc.ca, is not being
properly maintained
I can say this easily after looking at the latency plot for the
IDS|DDPLUS feed - the linearly increasing trend in latency indicates
that ldm-data's clock is drifting.
- the disparity in the latency for IDS|DDPLUS and NEXRAD2 on ldm-data
indicates that there is some limit to how much data (volume) a
single network connection can have. This situation might be mitigated
by either:
- finding the source of the bottleneck and getting it fixed
- or, splitting the high volume NEXRAD2 feed into several (e.g., 5)
mutually exclusive subsets
In order to make recommendations on how to split the NEXRAD2 feed,
we would need to see the LDM configuration file (~ldm/etc/ldmd.conf)
in use on ldm-data.
As a final comment I would like to add that we hold training workshops
each year for the software packages we support; the next training
workshop for the LDM will be held on August 1-2 at our facility here
in Boulder, CO. There is still at least one slot open for the LDM training
session, but it may fill in the next day or so. Information on our
training workshops can be found in:
Unidata HomePage
http://www.unidata.ucar.edu
Events -> 2013 Training Workshop
Please let me know if there was anything in the above that was unclear
or needs further explanation.
Cheers,
Tom
--
****************************************************************************
Unidata User Support UCAR Unidata Program
(303) 497-8642 P.O. Box 3000
address@hidden Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage http://www.unidata.ucar.edu
****************************************************************************
Ticket Details
===================
Ticket ID: PPJ-526440
Department: Support IDD
Priority: Normal
Status: Closed