[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[CONDUIT #GXA-551280]: Update on UW-Madison AOS Conduit/0.25 GFS
- Subject: [CONDUIT #GXA-551280]: Update on UW-Madison AOS Conduit/0.25 GFS
- Date: Tue, 04 Aug 2015 12:17:59 -0600
Hi Pete,
re:
> Here's the output of pqmon from a few times just now, in the middle of
> the 12 UTC GFS run coming in:
>
> [ldm@idd ~]$ pqmon
> Aug 04 15:59:10 pqmon NOTE: Starting Up (8657)
> Aug 04 15:59:10 pqmon NOTE: nprods nfree nempty nbytes maxprods maxfree
> minempty maxext age
> Aug 04 15:59:10 pqmon NOTE: 241606 834 228148 23323422632 470587 7344
> 0 3057664 1897
> Aug 04 15:59:10 pqmon NOTE: Exiting
>
> [ldm@idd ~]$ pqmon
> Aug 04 16:06:05 pqmon NOTE: Starting Up (9239)
> Aug 04 16:06:05 pqmon NOTE: nprods nfree nempty nbytes maxprods maxfree
> minempty maxext age
> Aug 04 16:06:05 pqmon NOTE: 235049 54 235485 22884777296 470587 7344
> 0 36648384 1573
> Aug 04 16:06:05 pqmon NOTE: Exiting
>
> [ldm@idd ~]$ pqmon
> Aug 04 16:08:05 pqmon NOTE: Starting Up (9429)
> Aug 04 16:08:05 pqmon NOTE: nprods nfree nempty nbytes maxprods maxfree
> minempty maxext age
> Aug 04 16:08:05 pqmon NOTE: 244621 12 225955 23999634152 470587 7344
> 0 286536 1637
> Aug 04 16:08:05 pqmon NOTE: Exiting
Very good. Thanks for the spot check.
re:
> I've just set up ldm metrics, so we can take a look at that in the next
> day or two.
Sounds good.
re:
> I'll see about trying to get some bandwidth plots, I think I
> can do that with our interface to the switch it is connected to.
Our monitoring of the outbound bandwidth for the real server backend nodes
of our relay cluster, idd.unidata.ucar.edu, is what alerted us our
ability to service the number of existing being maxed out - the volumes
indicated hit a ceiling on a couple of nodes. The net effect of this
is the same as is seen when "packet shaping" (artificial bandwidth
limiting) is in effect, and this, in turn, meant that some downstreams
were not getting all of the data that they were REQUESTing. We found the
same sort of maxing out of a connection from the accumulator frontends
of our cluster to the real server backends. This occurred when we
spun up our backup relay cluster, idd2.unidata.ucar.edu, as it doubled
the volume being sent through the accumulators Gbps Ethernet port. Of
course, this would have not been a problem if our cluster nodes had
10 Gbps Ethernet interfaces. We considered purchasing 10 Gbps Ethernet
cards for our existing machines, but we considered that this would be
a waste of money since the problem will go away when we refresh the
cluster hardware.
Cheers,
Tom
--
****************************************************************************
Unidata User Support UCAR Unidata Program
(303) 497-8642 P.O. Box 3000
address@hidden Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage http://www.unidata.ucar.edu
****************************************************************************
Ticket Details
===================
Ticket ID: GXA-551280
Department: Support CONDUIT
Priority: Normal
Status: Closed