[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20050316: 20050315: IDD top level relay atm.geo.nsf.gov PSU (cont.)



Tom,

Okay, I guess what we do for now is sit tight until we find out about our funding situation. We do have a proposal submitted that would cover an upgrade of our existing LDM relay. If it gets approved, we will work within the framework of that proposal to configure a two-system director/server arrangement to serve as a proof-of-concept starting point.

If there's anything we can/should do in the interim to prepare for implementation of this system (assuming all the pieces fall into place), let us know.

                                  Art.


On Wed, 16 Mar 2005, Unidata Support wrote:

From: "Arthur A. Person" <address@hidden>
Organization: PSU
Keywords:  200503102200.j2AM0Lq2027557 IDD toplevel relay

Hi Art,

re: cluster IDD development
Wow, that's a big step forward in making the IDD virtually failure-proof
(excepting networks, of course)!

We are pretty excited about the possibilities.  One idea we have in
mind is to create a virtual machine whose data service nodes are
distributed around the country and beyond, and in which directors
cooperate with one another to forward feed requests to the machine(s)
that have the desired data and will be best able to service those
requests.  Working towards this goal will be fun to say the least :-)

As far as Penn State is concerned, we
would want to follow Unidata's lead with whatever hardware/software is
required to operate a complementary top-level relay.  So, we would run
whatever level and type of system Unidata feels would fit best into the
IDD framework.

Wonderful!

Since there appears to be more hardware involved in the new design, the
issue of funding becomes more important.  We're currently seeking funding
for a system upgrade for our existing ldm relay, but the upgrade is only
for one system.

I understand.  I sent along some numbers for the systems we purchased
so that you could compare costs you are seeing with ones we have
encountered.

A question:  When top-level relays are established, does
Unidata typically furnish the hardware for them or does the relay site go
through the grant process?  Please advise.

Unfortunately, we do not have the money to fund the purchase of
equipment for top level relay nodes; I wish we did!  The best route is
to go through the Equipment Grant process.  I was talking to the others
involved in our cluster project this morning about how I wished we had
begun our cluster discussion with you about one to two months ago.
This would have given you sufficient time to prepare a proposal for the
current round of Equipment Grants.  Unfortunately, the proposal
deadline for the current round is something like Friday.

Here are some questions I have regarding your description of the new
top-level relay design:

1)  I'm interpreting that a director system collects all the data that
will be fed downstream and also forwards connection requests via ipvs to
the data servers. The data servers are populated by doing something like a
"request .*" from the director system.  Is this correct?

Yes, that is correct.  Our stress testing of the LDM showed that having
a high number of data requests on a machine that is servicing a high
number of downstream requests can be a bottleneck when the numbers of
feeds becomes large.  One way around this is to have collector front
ends that make lots of requests for data and then feed all ingested
data via single requests to the data server nodes.  This is not
"elegant", but it works very well.

2)  Does all the LDM handshaking for downstream sites go through the
director to the data servers, or do things go directly to the data servers
once a connection has been established?

The request goes to the director(s) since it(they) are the machine(s)
seen by the outside world as the feed site (e.g., idd.unidata.ucar.edu
and thelma.ucar.edu) -- they ARP.  IPVS then forwards the request to
the data servers based on a load metric (which is crude at the moment,
but will get better).  The data servers then turn the connection
to the requester around (like is done on a single machine) and start
pushing data.

3)  When multiple LDM's run on a single machine, I would assume that they
each use their own exclusive product queues (i.e. they don't share queue
files)... is this correct?

This is correct.  There is no mechanism for multiple LDMs to use
a shared queue.

Regarding hardware platforms, we've been using a lot of Dell/Intel servers
running RedHat Enterprise Linux here and are currently moving toward the
Intel 64-bit EMT architecture...  would that type of system be acceptable
as a top-level relay system or do you recommend that we replicate the
platform in use at Unidata?

We purchased two Dell 2850 machines (not as well configured as the dual
Opteron machines: 2x2.8 Ghz Xeon; 2 GB RAM) and are just loading them
with FC3 64-bit.  We will do some stress testing on these boxes to see
how well they compare with the dual Opteron SunFire V20Zs that I talked
about in my last email.  The nice thing about Xeon-based machines is
that FC[23] recognizes and uses their hyperthreading (two processors
look like 4).  We should be able to make some intelligent comments
about how well the 2850s perform in a couple of weeks.  My feeling is
that they will work well, but not as well as the dual Opteron boxes.  I
will be easily pursuaded if this is not the case, however.  I have no
emotional investment in the V20Zs :-)

One other thought comes to mind...  since it appears that the LDM is now
capable of running multiple instances of itself on one machine, it might
be useful as a proof-of-concept to initially install just one machine here
as both director, ipvs, and data server and shift a select set of
downstream users to this system and see how it does over some period of
time.

I thought of the same thing.  One comment I can make is that the
director node does not have to be a superpowered box.  You could setup
a "cluster" with a director and one data server.  The director would be
an existing machine, and the data server would be something like the
2850.  The reason I am confident that this would work is I am ingesting
all data available in the IDD on a 5 year old, dual 500 Mhz PIII box
put together from pieces that were headed for the recycle bin.  A box
like this could conceivably be used as both an IPVS director and data
accumulator.  I don't want to say that it absolutely would work, but I
have the gut feeling that it would.

If it performs well and everyone's happy with the network
reliability via Penn State, then we could upgrade to multiple data servers
below this system similar to the way it's configured now at Unidata.

Yes, the cluster approach lends itself to incremental expansion.

Cheers,

Tom
--
**************************************************************************** <
Unidata User Support                                    UCAR Unidata Program <
(303)497-8643                                                  P.O. Box 3000 <
address@hidden                                   Boulder, CO 80307 <
---------------------------------------------------------------------------- <
Unidata WWW Service              http://my.unidata.ucar.edu/content/support  <
---------------------------------------------------------------------------- <
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publicly available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.


Arthur A. Person
Research Assistant, System Administrator
Penn State Department of Meteorology
email:  address@hidden, phone:  814-863-1563