[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #DCN-100393]: Writer-Counter Error
- Subject: [LDM #DCN-100393]: Writer-Counter Error
- Date: Mon, 16 May 2016 08:48:08 -0600
Robert,
> I have encountered 5 or so instances in the past several years
> where I have attempted to manually restart LDM and received the "The
> writer-counter of the product-queue isn't zero..." message, which left
> LDM is a stopped state. I always resolved the situation by rebuilding
> the queue. In any case, I am somewhat hesitant to restart LDM during
> times when I am "pqinsert-ing" large files into the queue (for instance
> GRIB files during the model cycles) as I feel that would leave the queue
> most vulnerable. That said, I realized recently that the 'ldmadmin check'
> (which I run each hour) will induce an automatic restart if it needs
> to reconcile the queue (in my case I have a static 4G queue size and
> choose to decrease max latency). Getting to my question... are there
> any safegaurds built into the 'ldmadmin check' that might prevent the
> aforementioned error from occurring if it needs to restart the service?
> The last thing I would want is for the LDM service to stop during a
> self-induced restart. If there is no guarantee the service will always
> restart, is it better to set reconciliation to "do nothing" and manually
> reconcile the queue's max latency? Mind you I have never had such a
> auto-restart ever fail to restart, but I have had manual restarts result
> in the writer-counter error.
There are safeguards to ensure that the LDM product-queue doesn't get
corrupted. For example, the product-queue library blocks most signals
(including SIGTERM) while the queue is being accessed. That being said, there
is no guarantee that the LDM code is bug free.
I have no qualms having an active reconciliation mode if the product-queue is
close to its equilibrium size. The only problems I've seen are when the queue
is far too small for the reconciliation algorithm to make a good guess.
If the LDM doesn't restart after a reconciliation, then you likely have bigger
problems (disk partition full, for example).
> Best Regards,
> Bob
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: DCN-100393
Department: Support LDM
Priority: Normal
Status: Closed