[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[IDD #XEC-580619]: IDD Peering request from Unidata

This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Subject: [IDD #XEC-580619]: IDD Peering request from Unidata
Date: Sat, 16 Aug 2014 10:56:12 -0600
Hi Mike,

I see you work on Saturday mornings also :-)


re: splitting feeds is an accepted way to lower latencies

> That's great.  I wasn't sure if splitting up the feed request up, while 
> improving
> throughput/latency, was not view upon favorably by the upstream hosts as it 
> might
> require more CPU/memory on their ldm system.  I know on our ldm receive 
> system, each
> feed request creates a new ldmd process, not sure if it spawns a new one on 
> upstream ldm
> system as well.  Just want to be conscious of how much resources I might be 
> consuming
> from the upstream system.

You are exactly correct in recognizing that splitting feed REQUESTs has an 
impact
on the upstream machine(s).  We try to work with sites to tune their 
~ldm/etc/ldmd.conf
configurations so that their product receipt latencies and number of REQUESTs to
individual upstreams is minimized.  Unfortunately, we have no formula for doing
this; it is pretty much trial and error, but one can use the listing of volumes
by feed type as a guide.  For instance, the following is snapshot of the volumes
and number of products in all of the IDD feeds being ingested on our machine
shemp.unidata.ucar.edu:

Data Volume Summary for shemp.unidata.ucar.edu

Maximum hourly volume  30285.140 M bytes/hour
Average hourly volume  18948.620 M bytes/hour

Average products per hour     343531 prods/hour

Feed                           Average             Maximum     Products
                     (M byte/hour)            (M byte/hour)   number/hour
NEXRAD2                6405.202    [ 33.803%]     7800.190    74352.167
CONDUIT                3440.520    [ 18.157%]     5615.335    74644.333
NEXRAD3                2028.625    [ 10.706%]     2377.226    98159.095
EXP                    1641.817    [  8.665%]     4882.705     6342.929
NGRID                  1636.496    [  8.636%]     3399.559    22336.786
FSL2                   1442.247    [  7.611%]     1715.243     1676.667
FNMOC                  1136.608    [  5.998%]     6497.106     3158.024
HDS                     354.473    [  1.871%]      698.793    18004.429
SPARE                   282.843    [  1.493%]     1731.961       10.333
NIMAGE                  161.472    [  0.852%]      265.177      196.952
FNEXRAD                 125.509    [  0.662%]      152.478      106.214
GEM                      90.300    [  0.477%]      548.181      830.024
UNIWISC                  68.839    [  0.363%]      117.278       45.857
IDS|DDPLUS               53.590    [  0.283%]       64.898    42248.143
NOTHER                   52.888    [  0.279%]      374.643     1047.095
PCWS                     19.743    [  0.104%]       26.231       23.286
LIGHTNING                 6.797    [  0.036%]       16.925      346.762
DIFAX                     0.537    [  0.003%]        1.956        0.667
GPS                       0.113    [  0.001%]        1.202        1.048

re:
> But here is my new feed request now for NEXRAD2
> 
> REQUEST NEXRAD2         "^L2-BZIP2/K[A-D].*"    idd.tamu.edu
> REQUEST NEXRAD2         "^L2-BZIP2/K[E-K].*"    idd.tamu.edu
> REQUEST NEXRAD2         "^L2-BZIP2/K[L-P].*"    idd.tamu.edu
> REQUEST NEXRAD2         "^L2-BZIP2/K[Q-U].*"    idd.tamu.edu
> REQUEST NEXRAD2         "^L2-BZIP2/K[V-Z].*"    idd.tamu.edu

Very good.

One comment on how to specify extended regular expressions:

The use of leading/trailing '.*' is not needed nor wanted.  In fact,
one of the additions to the LDM was to simplify these kinds of
regular expressions because not doing so would result in significantly
longer times for processing.  This is true for both ~ldm/etc/ldmd.conf
file REQUEST entries and for 'pqact' pattern-action file actions.

I would rewrite your REQUESTs as:

REQUEST NEXRAD2 "^L2-BZIP2/K[A-D]" idd.tamu.edu
REQUEST NEXRAD2 "^L2-BZIP2/K[E-K]" idd.tamu.edu
REQUEST NEXRAD2 "^L2-BZIP2/K[L-P]" idd.tamu.edu
REQUEST NEXRAD2 "^L2-BZIP2/K[Q-U]" idd.tamu.edu
REQUEST NEXRAD2 "^L2-BZIP2/K[V-Z]" idd.tamu.edu

Also, if you are REQUESTing the NEXRAD2 data from additional upstreams, please
remember to split those feeds also.

re: TAMU has big networking pipes

> That's great. They obviously have some big pipes to get the data into them via
> Internet2.  I looked at the traceroute's from me to them.
> 
> We are almost all commodity Level 3 to them with latency around 30ms.

'traceroute' listings are a great tool, but using them as an absolute indication
of network connectivity can be misleading.  For instance, 'traceroute's from
my home machine to Unidata machines can vary from 30 to several hundred
milliseconds, but the actual throughput that can be achieved is typically
pretty consistent.  Speed tests (e.g., speedtest.net) results can also
be a bit misleading as some ISPs have been known to artificially enhance
results for their users (by prioritizing speed testing traffic) to make
their service look good.  The real test is now much data can one move
to/from their machine using real world datasets.

re: Can we hold on to this offer for later?

> Sure!  I am not pulling in the CONDUIT feed, I don't think at the moment,

The real-time statistics that you are reporting do not show CONDUIT as
one of the feeds.  Other machines you may be running that are ingesting but
not reporting real-time statistics could have an entirely different set
of feeds, and we would have no way of knowing.  Your upstream feed site(s)
know what you are REQUESTing, of course, so it is not like those (hypothetical)
machines can remain totally anonymous.

re:
> and will be willing to feed it to others. Right now I can dedicate a server 
> with LDM 4x
> processors / 8gb of ram with a 100Mb/s port for this. Certainly any Gov, Edu, 
> private
> site that would like to have access to the feed can.

The good news is that the LDM uses little machine resources other than memory.
The BIG question would be what volume your network connection allows.

By the way, we have noted that the faster your machine's Ethernet connection
is, the better.  NCAR/RAL was having latency issues ingesting data from
us a number of years ago even though their network connection was gigabit.
We traced down the problem to them running their Ethernet interface at
100 Mbps even though it supported Gbps.  As soon as the interface was
reconfigured, their latencies dropped to essentially zero.  I relay this
story so that you can investigate if your Ethernet setup on your LDM
machines is limiting your data ingestion.

 
re: I a confident that splitting of the feed REQUESTs can drop your
NEXRAD2 latency down to near zero ** ...

> That's great. Thanks Tom for pointing this out. I didn't think breaking it up 
> would make
> that much difference.

The difference can be huge ** when the artificial rate limiting/packet shaping
is limiting per-connection volumes **.  If your overall data volume is being
limited, splitting your feed REQUESTs may have no major effect.

re: where are your LDM machines physically located?

> They are physically located in a datacenter in Asheville, NC.  Not far from 
> NCDC.

OK, good.  We know that the network backbones in the Ashville, NC area are very 
good.

> The
> company I use recently built a state of the art data center near Charleston, 
> SC where I
> am living now. The plan is to locate some of the equipment here for 
> physical/network
> diversity. They are upgrading the networking capacity in Charleston this 
> summer.

Diversifying your locations is a good move!

re:
> Here is a network map of the vendor we use.
> 
> http://www.immedion.com/colocation/network-services
> 
> For traceroute purposes etc, you can use these two servers:
> 
> noaaport1.wright-weather.com
> noaaport2.wright-weather.com

OK.

One last question:  does 'noaaport' in the name of one of your machines
indicate that the machine is ingesting NOAAPort data directly from the
SBN?  If it does, you may want to consider moving non-NOAAPort-delivered
feeds off of these machines.  Remember that the NOAAport SBN bandwidth
will be increased from 30 Mbps to 60 Mbps on Monday (although there will
be a 45 day dual illumination period so users can transition).

re:
> Thanks again.

No worries.  Enjoy your weekend!

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: XEC-580619
Department: Support IDD
Priority: Normal
Status: Closed
Prev by Date: [IDD #XEC-580619]: IDD Peering request from Unidata
Next by Date: [IDD #WMT-338636]: Upstream Data Feeds
Previous by thread: [IDD #XEC-580619]: IDD Peering request from Unidata
Next by thread: [IDD #JRM-232824]: Loss of link to Unidata server
Index(es):
- Date
- Thread