This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
"Jennie L. Moody" wrote: > > Anne, > > I noticed that things were messed up again on windfall, > but you were on by the time I came into work and I > had to deal with some other stuff. I just looked at > the fact that you had restarted the ldm, and it looks > good so far. Seems that the issue was again related to > PennState dropping me, the system doing a failover, > then failing back to navier, and having problems. > Hi Jennie, Yes, I figured out the problem and made a temporary fix. When the failover occurs via cron the ldm is restarted in a different environment. In particular, it's not getting the PATH variable set correctly, so it's unable to find things it needs. The not so satisfactory fix I made is to include your entire path in the ldmfail script. The problem with this is that you'll need to modify the script every time you upgrade. Instead, I would prefer to change how cron invokes the script. That way, your upgrades will be straightforward. But, I'll have to do some research on how to accomplish this. > I should point out, I don't believe that automatic failovers > were ever working under the old account (ldma), in fact > I thought this was one of the new features of this version > of the ldm? > This is interesting! I was really trying to understand why it was working before - an assumption on my part. I wouldn't say ldmfail is a new feature of this version. It's been around since at least 5.0.9, and probably before. But, this version is different than the old one you have. The old one was written by Mitch, and the latest one was redone by Robb. > I found a few problems with files that get written by > some of my product scripts, these were gif files that > were still owned by ldma, and while they were in the > same group as ldma (I think it looks like you left ldma > in a group with ldm and mcidas), several files had > only read-permission for the group....so, ldm fired > batches that tried to write out gif files were not > completing successfully. Anyway, I changed all these > (they were all in our /home/mcuser/webpage directory). > This doesn't impact anything else, but it explains to > me why some products on our website were updating > while others were not. > Great! I'm glad you figured this out. It's difficult for me to fully understand what a user is trying to do with their data without spending lots of time. > I have to head home now, but I hope that whatever the > failover gremlin is, you don't have to work to hard to > find him. > I feel better about at least diagnosing the problem and having a patch, although it's not the greatest solution. I hope we're on the home stretch. > As always, I appreciate that you are taking note and > assisting! > > Jennie It is my pleasure to be able to be of assistance! Anne -- *************************************************** Anne Wilson UCAR Unidata Program address@hidden P.O. Box 3000 Boulder, CO 80307 ---------------------------------------------------- Unidata WWW server http://www.unidata.ucar.edu/ ****************************************************