[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[IDD #XEC-580619]: IDD Peering request from Unidata
- Subject: [IDD #XEC-580619]: IDD Peering request from Unidata
- Date: Sat, 16 Aug 2014 10:56:12 -0600
Hi Mike,
I see you work on Saturday mornings also :-)
re: splitting feeds is an accepted way to lower latencies
> That's great. I wasn't sure if splitting up the feed request up, while
> improving
> throughput/latency, was not view upon favorably by the upstream hosts as it
> might
> require more CPU/memory on their ldm system. I know on our ldm receive
> system, each
> feed request creates a new ldmd process, not sure if it spawns a new one on
> upstream ldm
> system as well. Just want to be conscious of how much resources I might be
> consuming
> from the upstream system.
You are exactly correct in recognizing that splitting feed REQUESTs has an
impact
on the upstream machine(s). We try to work with sites to tune their
~ldm/etc/ldmd.conf
configurations so that their product receipt latencies and number of REQUESTs to
individual upstreams is minimized. Unfortunately, we have no formula for doing
this; it is pretty much trial and error, but one can use the listing of volumes
by feed type as a guide. For instance, the following is snapshot of the volumes
and number of products in all of the IDD feeds being ingested on our machine
shemp.unidata.ucar.edu:
Data Volume Summary for shemp.unidata.ucar.edu
Maximum hourly volume 30285.140 M bytes/hour
Average hourly volume 18948.620 M bytes/hour
Average products per hour 343531 prods/hour
Feed Average Maximum Products
(M byte/hour) (M byte/hour) number/hour
NEXRAD2 6405.202 [ 33.803%] 7800.190 74352.167
CONDUIT 3440.520 [ 18.157%] 5615.335 74644.333
NEXRAD3 2028.625 [ 10.706%] 2377.226 98159.095
EXP 1641.817 [ 8.665%] 4882.705 6342.929
NGRID 1636.496 [ 8.636%] 3399.559 22336.786
FSL2 1442.247 [ 7.611%] 1715.243 1676.667
FNMOC 1136.608 [ 5.998%] 6497.106 3158.024
HDS 354.473 [ 1.871%] 698.793 18004.429
SPARE 282.843 [ 1.493%] 1731.961 10.333
NIMAGE 161.472 [ 0.852%] 265.177 196.952
FNEXRAD 125.509 [ 0.662%] 152.478 106.214
GEM 90.300 [ 0.477%] 548.181 830.024
UNIWISC 68.839 [ 0.363%] 117.278 45.857
IDS|DDPLUS 53.590 [ 0.283%] 64.898 42248.143
NOTHER 52.888 [ 0.279%] 374.643 1047.095
PCWS 19.743 [ 0.104%] 26.231 23.286
LIGHTNING 6.797 [ 0.036%] 16.925 346.762
DIFAX 0.537 [ 0.003%] 1.956 0.667
GPS 0.113 [ 0.001%] 1.202 1.048
re:
> But here is my new feed request now for NEXRAD2
>
> REQUEST NEXRAD2 "^L2-BZIP2/K[A-D].*" idd.tamu.edu
> REQUEST NEXRAD2 "^L2-BZIP2/K[E-K].*" idd.tamu.edu
> REQUEST NEXRAD2 "^L2-BZIP2/K[L-P].*" idd.tamu.edu
> REQUEST NEXRAD2 "^L2-BZIP2/K[Q-U].*" idd.tamu.edu
> REQUEST NEXRAD2 "^L2-BZIP2/K[V-Z].*" idd.tamu.edu
Very good.
One comment on how to specify extended regular expressions:
The use of leading/trailing '.*' is not needed nor wanted. In fact,
one of the additions to the LDM was to simplify these kinds of
regular expressions because not doing so would result in significantly
longer times for processing. This is true for both ~ldm/etc/ldmd.conf
file REQUEST entries and for 'pqact' pattern-action file actions.
I would rewrite your REQUESTs as:
REQUEST NEXRAD2 "^L2-BZIP2/K[A-D]" idd.tamu.edu
REQUEST NEXRAD2 "^L2-BZIP2/K[E-K]" idd.tamu.edu
REQUEST NEXRAD2 "^L2-BZIP2/K[L-P]" idd.tamu.edu
REQUEST NEXRAD2 "^L2-BZIP2/K[Q-U]" idd.tamu.edu
REQUEST NEXRAD2 "^L2-BZIP2/K[V-Z]" idd.tamu.edu
Also, if you are REQUESTing the NEXRAD2 data from additional upstreams, please
remember to split those feeds also.
re: TAMU has big networking pipes
> That's great. They obviously have some big pipes to get the data into them via
> Internet2. I looked at the traceroute's from me to them.
>
> We are almost all commodity Level 3 to them with latency around 30ms.
'traceroute' listings are a great tool, but using them as an absolute indication
of network connectivity can be misleading. For instance, 'traceroute's from
my home machine to Unidata machines can vary from 30 to several hundred
milliseconds, but the actual throughput that can be achieved is typically
pretty consistent. Speed tests (e.g., speedtest.net) results can also
be a bit misleading as some ISPs have been known to artificially enhance
results for their users (by prioritizing speed testing traffic) to make
their service look good. The real test is now much data can one move
to/from their machine using real world datasets.
re: Can we hold on to this offer for later?
> Sure! I am not pulling in the CONDUIT feed, I don't think at the moment,
The real-time statistics that you are reporting do not show CONDUIT as
one of the feeds. Other machines you may be running that are ingesting but
not reporting real-time statistics could have an entirely different set
of feeds, and we would have no way of knowing. Your upstream feed site(s)
know what you are REQUESTing, of course, so it is not like those (hypothetical)
machines can remain totally anonymous.
re:
> and will be willing to feed it to others. Right now I can dedicate a server
> with LDM 4x
> processors / 8gb of ram with a 100Mb/s port for this. Certainly any Gov, Edu,
> private
> site that would like to have access to the feed can.
The good news is that the LDM uses little machine resources other than memory.
The BIG question would be what volume your network connection allows.
By the way, we have noted that the faster your machine's Ethernet connection
is, the better. NCAR/RAL was having latency issues ingesting data from
us a number of years ago even though their network connection was gigabit.
We traced down the problem to them running their Ethernet interface at
100 Mbps even though it supported Gbps. As soon as the interface was
reconfigured, their latencies dropped to essentially zero. I relay this
story so that you can investigate if your Ethernet setup on your LDM
machines is limiting your data ingestion.
re: I a confident that splitting of the feed REQUESTs can drop your
NEXRAD2 latency down to near zero ** ...
> That's great. Thanks Tom for pointing this out. I didn't think breaking it up
> would make
> that much difference.
The difference can be huge ** when the artificial rate limiting/packet shaping
is limiting per-connection volumes **. If your overall data volume is being
limited, splitting your feed REQUESTs may have no major effect.
re: where are your LDM machines physically located?
> They are physically located in a datacenter in Asheville, NC. Not far from
> NCDC.
OK, good. We know that the network backbones in the Ashville, NC area are very
good.
> The
> company I use recently built a state of the art data center near Charleston,
> SC where I
> am living now. The plan is to locate some of the equipment here for
> physical/network
> diversity. They are upgrading the networking capacity in Charleston this
> summer.
Diversifying your locations is a good move!
re:
> Here is a network map of the vendor we use.
>
> http://www.immedion.com/colocation/network-services
>
> For traceroute purposes etc, you can use these two servers:
>
> noaaport1.wright-weather.com
> noaaport2.wright-weather.com
OK.
One last question: does 'noaaport' in the name of one of your machines
indicate that the machine is ingesting NOAAPort data directly from the
SBN? If it does, you may want to consider moving non-NOAAPort-delivered
feeds off of these machines. Remember that the NOAAport SBN bandwidth
will be increased from 30 Mbps to 60 Mbps on Monday (although there will
be a 45 day dual illumination period so users can transition).
re:
> Thanks again.
No worries. Enjoy your weekend!
Cheers,
Tom
--
****************************************************************************
Unidata User Support UCAR Unidata Program
(303) 497-8642 P.O. Box 3000
address@hidden Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage http://www.unidata.ucar.edu
****************************************************************************
Ticket Details
===================
Ticket ID: XEC-580619
Department: Support IDD
Priority: Normal
Status: Closed