This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>From: Gilbert Sebenste <address@hidden> >Organization: NIU >Keywords: 200503250511.j2P5B4v2005521 Hi Gilbert, >We had a network upgrade here in my building today here at NIU. A 1 GB >backbone was installed last year, and as of 2:30 PM MT today, I'm hogging >100 mb of it instead of just 10. :-) Feel the power ;-) >I thought we would be on Abilene right now with a 1 GB connection, 1 Gps _to_ Abilene!? You sure? NCAR/UCAR has an OC12 (or is it OC155). >but >(surprise!), paperwork is holding it up. It will happen, contracts are >currently being signed. But when it happens, I'd like to relay more data >to other Abilene sites. But, maybe there's other ways to help, too. > >Suggestions? Comments? Snide remarks? There are several ways to help the community: relaying data, acting as a THREDDS site, creating a large near-realtime data depository that is open (akin to, but not necessarily exactly the same as being a THREDDS site), etc. >What can a 100 mb trunk to me and a >1 GB Abilene connection do to help the community? Once you are fully connected, it would be useful/interesting to run some stress tests to see how much data you can relay with no added latency. One of the key necessities is high availability hardware, something that we have been experimenting with here at the UPC. In the next section, I include an email that I sent to another user that is strongly considering becoming a toplevel IDD relay node. As you will see, the note describes a cluster approach that we have been pursuing. As you read the info, please remember that we are still learning about the cluster and _will_ be making changes to the setup in the coming days/weeks/months/etc. From: Unidata Support <address@hidden> Date: Tue, 15 Mar 2005 18:41:01 -0700 Subject: 20050315: IDD top level relay atm.geo.nsf.gov PSU (cont.) re: >How should we proceed from here? Perhaps it would be useful if I described the setup we have been moving towards for our toplevel IDD relay nodes -- idd.unidata.ucar.edu and thelma.ucar.edu. Let me warn you that I am not the expert in what I am about to say, but I think I can relate the essence of what we have been working on. The real brains behind what I describe below are: John Stokes - cluster design and implementation Steve Emmerson - LDM development Mike Schmidt - system administration and cluster design Steve Chiswell - IDD design and monitoring I am sure that these guys will chime in when they see something I have mis-stated :-) As you know, in addition to atm.geo.nsf.gov we operate the top level IDD relay nodes idd.unidata.ucar.edu and thelma.ucar.edu. Instead of idd.unidata and thelma.ucar being simple machines, they are part of a cluster that is composed of 'director's (machines that directs IDD feed requests to other machines) and 'data servers' (machines that are fed requests by the director(s) and service those requests). We are using the IP Virtual Server (IPVS) available in current versions of Linux to forward feed requests from 'directors' to 'data servers'. In our cluster, we are using Fedora Core 3 64-bit Linux run on a set of identically configured Sun SunFire V20Z 1U rackmount servers: dual Opterons; 4 GB RAM; 2x36 GB 10K RPM SCSI; dual GB Ethernet interfaces. We got in on a Sun educational discount program and bought our 5 V20Zs for about $3000 each. These machines are stellar performers for IDD work when running Fedora Core 3 64-bit Linux. We tested three operating systems side-by-side before settling on FC3; the others were Sun Solaris x86 10 and FreeBSD 5.3, both of which are 64-bit. FC3 was the _clear_ winner; FreeBSD was second; and Solaris x86 10 was a _distant_ third. As I understand it, RedHat Enterprise WS 4 is FC3 with full RH support. Here is a "picture" of what idd.unidata.ucar.edu and thelma.ucar.edu currently look like (best viewed with fixed width fonts): |<----------- directors ------------>| +-------+ +-------+ | ^ | ^ V | V | +---------------+ +---------------+ idd.unidata | LDM | IPVS | | LDM | IPVS | thelma.ucar +---------------+ +---------------+ / \ | | / \ / \ | | / \ / \ +----+ | / \ +-------/-------\------|----------+/ \ | / \ | / \ | / \ +----------------+ \ | / \ / | \ V / \ / V \ +---------------+ +---------------+ +---------------+ | 'uni2' LDM | | 'uni3' LDM | | 'uni4' LDM | +---------------+ +---------------+ +---------------+ |<----------------- data servers ---------------------->| The top level indicates two 'director' machines: idd.unidata.ucar.edu and thelma.ucar.edu (thelma used to be a SunFire 480R SPARC III box). Both of these machines are running IPVS and LDM 6.3.0 configured on a second interface (IP). The IPVS 'director' software forwards port 388 requests received on a one interface configured as idd.unidata.ucar.edu on one machine and thelma.ucar.edu on the other. The set of 'data server' backends are the same for both directors (at present). When an IDD feed request is received by idd.unidata.ucar.edu or thelma.ucar.edu it is relayed by the IPVS software to one of the data servers. Those machines are configured to also be known internally as idd.unidata.ucar.edu or thelma.ucar.edu, but they do not ARP, so they are not seen by the outside world/routers. The IPVS software keeps track of how many connections are on each of the data servers and forwards ("load levels") based on connection numbers (we will be changing this metric as we learn more about the setup). The data servers are all configured identically: same RAM, same LDM queue size (8 GB currently), same ldmd.conf contents, etc. All connections from a downstream machine will always be sent to the same data server as long as its last connection has not died more than one minute ago. This allows downstream LDMs to send an "are you alive" query to a server that they have not received data from in awhile. Once there have been no IDD request connections by a downstream host for one minute, a new request will be forwarded to the data server that is least loaded. This design allows us to take down any of the data servers for whatever maintenance is needed (hardware, software, etc.) whenever we feel like it. When a machine goes down, the IPVS server is informed that the server is no longer available, and all downstream feed requests are sent to the other data servers that remain up. On top of that, thelma.ucar.edu and idd.unidata.ucar.edu are on different LANs and may soon be located in different parts of the UCAR campus. LDM 6.3.0 was developed to allow running the LDM on a particular interface (IP). We are using this feature to run an LDM on the same box that is running the IPVS 'director'. The IPVS listens on one interface (IP) and the LDM runs on another. The alternate interface does not necessarily have to represent a different Ethernet device; it can be a virtual interface configured in software. The ability to run LDMs on specific interfaces (IPs) allows us to run LDMs as either 'data collectors' or as additional data servers on the same box running the 'director'. By 'data collector', I mean that the LDMs on the 'director' machines have multiple ldmd.conf requests that bring data to the cluster (e.g., CONDUIT from atm, UIUC, and/or, NEXRAD2 from Purdue, HDS from here, IDS|DDPLUS from there, etc.). The data server LDMs request data redundantly from the 'director' LDMs. We currently do not have redundancy for the directors, but we will be adding that in the future. We are just getting our feet wet with this cluster setup. We will be modifying configuations as we learn more about how well the system works. In stress tests run here at the UPC, we were able to demonstrate that one V20Z was able to handle 50% more downstream connections than the 480R thelma.ucar.edu without introducing latency. With three data servers we believe that we can now field literally every IDD feed request in the world if we had to (the ultimate failover site). If the load on the data servers ever becomes too high, all we need do is add one or more additional boxes to the mix. The ultimate limiting factor in this setup will be the routers and network bandwidth here in UCAR. Luckily, we have excellent networking! All of the above may not seem like an answer to your question "How should we proceed from here", but I felt that it was important for you (PSU) to get a clearer picture of our IDD development. We have talked about upgrading atm to a cluster like that described above and have also considered approaching GigaPops like the MAX (U Maryland) to see if they would be interested in running a cluster there (we feel that it is best to have top level relays as close to a GigaPop as possible). Since you (PSU) are willing to play a leading role in the IDD relay effort, I feel like we should come to an agreement on the class of installation that would best handle current and future needs. The cluster that is currently configured relays an average of 120 Mbps (~1.2 TB/day) to downsteam connections. Peak rates can, however, exceed 250 Mbps. Please let us know of any questions you have on the above. There should be some since I have most likely not portrayed things clearly enough. Cheers, Tom -- NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us. >From address@hidden Sun Mar 27 01:10:08 2005 Hi Tom, re: NIU connection to Abilene >Yeppers. Like, HUGE power here. re: creating a large near-realtime data repository open to the community >The latter may be my next step. >OK. BTW, I learned on Friday that due to paperwork, it won't happen for >another 4 months. Grrr. But at least my backbone is 100 mb to the campus >network. re: cluster approach >Wow. My head is spinning. :-) I'm a considerable ways from that point. But >as firepower on computers continues to go up (July 1, the next generation >of Pentium chips comes out, or so I understand), this will likely become >more and more of an issue for me. I'll keep watching! As always, Tom, >thanks for the info!!! Gilbert ******************************************************************************* Gilbert Sebenste ******** (My opinions only!) ****** Staff Meteorologist, Northern Illinois University **** E-mail: address@hidden *** web: http://weather.admin.niu.edu ** Work phone: 815-753-5492 * *******************************************************************************