[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #HCO-891524]: Unidata Performance Question
- Subject: [LDM #HCO-891524]: Unidata Performance Question
- Date: Fri, 13 Mar 2015 10:15:17 -0600
Mike,
Yes, if you have to kill LDM processes like that then you should execute the
command "ldmadmin clean" *and* re-create the product-queue because it's likely
been corrupted.
> Steve, Thanks for looking at this head scratcher.
>
> I suspect I corrupted to product queue when I killed the pqinsert that had
> been running for over 24 hours. When I did a top I had the single process
> that had accumulated over 24 hours of cpu time. I tried sending a TERM
> signal to the process. I waited a couple of minutes and when that didn’t
> take I issued a KILL to shut it down. I hoped that after that the other
> inserts would start running but it never happened. I then tried to shut down
> the ldmd through ldmadmin but it never terminated. I started to then
> systematically kill all the pqinserts still waiting. Once I got them all
> killed ldmd shut down successfully. After that I tried to run the pqcheck
> and that’s when I had to wait over 40 minutes for a check that never
> finished. In retrospect reading the instructions again I think I should have
> run the clean option through ldmadmin.
>
> The ldm system is running via a start command. While the perl script
> executes is a good question. There is nothing to prevent the ldmadmin start
> from occurring while the scripts are running. The perl scripts run on a cron
> to pull data from external gps receivers. I guess that is something to
> consider.
>
> During this instance the ldm was up and running before the cron jobs were
> started and this product queue had been populating for over a week before we
> hit this snag.
>
> There shouldn’t be any problems with power on the servers. They’re all UPS
> protected, with a generator as a secondary electrical to regular utilities.
>
> This seemed really strange that the pqinsert got stuck on a single file
> trying to insert it. As I say I don’t have any good theories on what may
> have occurred, other than to say I hope it’s a one time cosmic pixie dust
> anomaly that never happens again.
>
> I still suspect it might be a file access error, that the pqinsert was called
> before the file was fully written out. I’m looking at building a more robust
> way to see that the system is done with the file before it tries to call
> pqinsert. I’m looking at deeper system level calls to see that the OS is
> done writing out the file than simply monitoring the mod time of the file.
>
> Again Thanks for lending your expertise. At least I know I’m not missing
> something completely obvious.
>
> -Mike
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: HCO-891524
Department: Support LDM
Priority: Normal
Status: Closed