This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Mike, Yes, if you have to kill LDM processes like that then you should execute the command "ldmadmin clean" *and* re-create the product-queue because it's likely been corrupted. > Steve, Thanks for looking at this head scratcher. > > I suspect I corrupted to product queue when I killed the pqinsert that had > been running for over 24 hours. When I did a top I had the single process > that had accumulated over 24 hours of cpu time. I tried sending a TERM > signal to the process. I waited a couple of minutes and when that didn’t > take I issued a KILL to shut it down. I hoped that after that the other > inserts would start running but it never happened. I then tried to shut down > the ldmd through ldmadmin but it never terminated. I started to then > systematically kill all the pqinserts still waiting. Once I got them all > killed ldmd shut down successfully. After that I tried to run the pqcheck > and that’s when I had to wait over 40 minutes for a check that never > finished. In retrospect reading the instructions again I think I should have > run the clean option through ldmadmin. > > The ldm system is running via a start command. While the perl script > executes is a good question. There is nothing to prevent the ldmadmin start > from occurring while the scripts are running. The perl scripts run on a cron > to pull data from external gps receivers. I guess that is something to > consider. > > During this instance the ldm was up and running before the cron jobs were > started and this product queue had been populating for over a week before we > hit this snag. > > There shouldn’t be any problems with power on the servers. They’re all UPS > protected, with a generator as a secondary electrical to regular utilities. > > This seemed really strange that the pqinsert got stuck on a single file > trying to insert it. As I say I don’t have any good theories on what may > have occurred, other than to say I hope it’s a one time cosmic pixie dust > anomaly that never happens again. > > I still suspect it might be a file access error, that the pqinsert was called > before the file was fully written out. I’m looking at building a more robust > way to see that the system is done with the file before it tries to call > pqinsert. I’m looking at deeper system level calls to see that the OS is > done writing out the file than simply monitoring the mod time of the file. > > Again Thanks for lending your expertise. At least I know I’m not missing > something completely obvious. > > -Mike Regards, Steve Emmerson Ticket Details =================== Ticket ID: HCO-891524 Department: Support LDM Priority: Normal Status: Closed