[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #UYH-624598]: LVS realserver switching loses data
- Subject: [LDM #UYH-624598]: LVS realserver switching loses data
- Date: Thu, 26 Oct 2006 16:40:12 -0600
Art,
> Must the "last product received" be a member of the feed stream being
> requested (e.g. CONDUIT)? If so, how does the LDM "remember" the time of
> that product? Does it check the products in the queue when it starts, or
> are there entries in the queue for each stream type indicating the last
> product received?
When starting from scratch, a downstream LDM checks the product-queue
for the most recent product that matches the product-class that the
downstream LDM will request. Downstream LDM-s remember the last
successfully-received data-product. If a downstream LDM is the only
one receiving a particular class of products, then it uses the
signature and the product-creation time (minus 60 seconds)
from the last successfully-received product when reconnecting. If a
downstream LDM is one of many receiving a particular class of products,
then it searches back through the queue for the most recent matching
product and uses the same information from it when reconnecting.
I can imagine a scenario in which a gap could result from the near
simultaneous disconnection of two downstream LDM-s on the same computer
-- each receiving the same class of products but from different
upstream LDM-s. Products in product-queues of the upstream LDM-s
would also have to be in different order.
> How would splitting the feed affect this? For example,
> our ingest machine currently splits the feed request into two pieces for
> CONDUIT: "[02468]$" and "[^02468]$", but our realserver getting the data
> from the ingester requests CONDUIT as ".*".
Splitting a feed results in distinct product-classes and, hence,
independent downstream LDM-s. So there would be no effect.
> Here's a few log file lines from a data loss instance today:
>
> DOWNSTREAM REALSERVER MACHINE:
> Oct 26 15:59:02 iddrs3 idd-ingest.meteo.psu.edu[11336] NOTE: LDM-6 desired
> product-class: 20061026152852.421 TS_ENDT {{CONDUIT, ".*"},{NONE,
> "SIG=d9d8d8a75a5c05b6556718c17f692a04"}}
> Oct 26 15:59:04 iddrs3 idd-ingest.meteo.psu.edu[11336] NOTE: LDM-6 desired
> product-class: 20061026152852.421 TS_ENDT {{CONDUIT, ".*"},{NONE,
> "SIG=d9d8d8a75a5c05b6556718c17f692a04"}}
> Oct 26 15:59:06 iddrs3 idd-ingest.meteo.psu.edu[11336] NOTE: LDM-6 desired
> product-class: 20061026152852.421 TS_ENDT {{CONDUIT, ".*"},{NONE,
> "SIG=d9d8d8a75a5c05b6556718c17f692a04"}}
>
> UPSTREAM INGEST MACHINE:
> Oct 26 15:59:10 iddrs2 iddrs3.meteo.psu.edu(feed)[21975] NOTE: Starting
> Up(6.4.5/6): 20061026155901.886 TS_ENDT {{CONDUIT, ".*"}}, Primary
> Oct 26 15:59:10 iddrs2 iddrs3.meteo.psu.edu(feed)[21975] NOTE: topo:
> iddrs3.meteo.psu.edu {{CONDUIT, (.*)}}
>
> (Note 1: iddrs2 in this case is actually the idd-ingest machine as I had
> to switch things around this morning after some hardware problems but the
> name didn't get updated.
>
> Note 2: idd-ingest (iddrs2) requests the CONDUIT data in a split feed as
> I describe above, whereas iddrs3 requests CONDUIT in one request as ".*")
>
> As best as I can interpret these entries, it looks like the realserver
> (iddrs3) was requesting CONDUIT data with an age since 15:28:52 but the
> ingest server (idd-ingest a.k.a. iddrs2) responded with data with an age
> since 15:59:01 which also coincides with an ldm restart of the ingest
> machine. Am I reading this right? Can you provide any further insights
> on these log entries? I should note that the iddrs3 system was not
> stopped/restarted during the above period, but was waiting for idd-ingest
> (iddrs2) to come back to provide a feed.
The last product received by the downstream LDM on Iddrs3 had a creation-time
of 20061026152852.421 and the given signature. On Iddrs2, the signature
was associated with a product that was INSERTED into Iddrs2's product-
queue at 20061026155901.886. Iddrs2's LDM started sending data-products
beginning with the product that was inserted just after that time.
There are two times involved in all this: one is the product-creation
time and the other is the time that a product is inserted into the
local product-queue.
> [Re: monitoring the age of the oldest product]
> Okay, I'll take a look at starting up a monitor...
>
>
> Thanks again for your help...
>
> Art
>
> Arthur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email: address@hidden, phone: 814-863-1563
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: UYH-624598
Department: Support LDM
Priority: Normal
Status: Closed