[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #JIG-686458]: ldm 6.4.7.1 restart problem
- Subject: [LDM #JIG-686458]: ldm 6.4.7.1 restart problem
- Date: Mon, 15 Jan 2007 16:56:29 -0700
Art,
> I've been having occasional problems with our LDM (6.4.7.1) "resetting"
> and instead of resuming data collection at the time it left off, it goes
> back as far as the "-m" setting in the upstream queue. Below is an
> example from yesterday with iddrs3 being the downstream node and
> idd-ingest being the upstream:
>
> Jan 14 16:29:24 idd-ingest iddrs3.meteo.psu.edu(feed)[19268] ERROR:
> Couldn't flush connection; nullproc_6() failure to iddrs3.meteo.psu.edu:
> RPC: Timed out
> Jan 14 16:29:46 idd-ingest iddrs3.meteo.psu.edu[19885] WARN:
> findTimeEntryWithOffset(): Target data-product with given metadata not
> found in time-map near its creation-time (20070114162655.320)
Data-products are indexed in the product-queue according to their
insertion-time and data-products have their creation-time as part
of their metadata. As a consequence, it's very important that the
clocks on the various LDM machines be correct. For example, if the
data-product that host iddrs3 last received from host idd-ingest
had a creation-time that was later than the time (according to the
system clock on idd-ingest) when it was inserted into idd-ingest's
product-queue, then the LDM on idd-ingest will be unable to find
that last, successfully-transmitted data-product.
Also, is the product-queue on idd-ingest large enough? What's the
mean age of the oldest data-product? What the minimum age of the
oldest product? (Use pqmon(1) to discover this.)
> Jan 14 16:29:46 idd-ingest iddrs3.meteo.psu.edu[19885] NOTE: Data-product
> with signature e364b6103d1b4037e788f32c5516c86b wasn't found in
> product-queue
> Jan 14 16:29:46 idd-ingest iddrs3.meteo.psu.edu(feed)[19885] NOTE:
> Starting Up(6.4.7.1/6): 20070114102944.933 TS_ENDT {{ANY, ".*"}},
> SIG=e364b6103d1b4037e788f32c5516c86b, Primary
> Jan 14 16:29:46 idd-ingest iddrs3.meteo.psu.edu(feed)[19885] NOTE: topo:
> iddrs3.meteo.psu.edu {{ANY, (.*)}}
>
>
> Jan 14 16:29:34 iddrs3 idd-ingest.meteo.psu.edu[3470] ERROR: readtcp():
> EOF on socket 4
> Jan 14 16:29:44 iddrs3 idd-ingest.meteo.psu.edu[3470] ERROR:
> one_svc_run(): RPC layer closed connection
> Jan 14 16:29:44 iddrs3 idd-ingest.meteo.psu.edu[3470] ERROR: Disconnecting
> due to LDM failure; Connection to upstream LDM closed
> Jan 14 16:29:44 iddrs3 idd-ingest.meteo.psu.edu[3470] NOTE: LDM-6 desired
> product-class: 20070114102944.933 TS_ENDT {{ANY, ".*"},{NONE,
> "SIG=e364b6103d1b4037e788f32c5516c86b"}}
The LDM on iddrs3 is asking for data that was created about 4 hours ago.
Is the maximum acceptable latency on iddrs3 really 4 hours?
> Jan 14 16:29:45 iddrs3 idd-ingest.meteo.psu.edu[3470] NOTE: Upstream LDM-6
> on idd-ingest.meteo.psu.edu is willing to be a primary feeder
>
> Any ideas on what might be causing this?
>
> Thanks.
>
> Art
>
> Arthur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email: address@hidden, phone: 814-863-1563
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: JIG-686458
Department: Support LDM
Priority: Normal
Status: On Hold