[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #USJ-914724]: LDM on Snow Leopard
- Subject: [LDM #USJ-914724]: LDM on Snow Leopard
- Date: Mon, 21 Sep 2009 08:54:23 -0600
Dave,
> Yeah, I might have been the one who originally encountered this
> problem and asked you'all for help with it.
You are (now that you reminded me -- it's truly been a while).
> Apple's lack of response
> has been disappointing.
To say the least.
> Maybe they'll be more responsive during this
> period immediately following Snow Leopard's release, while they try to
> get the bugs out of it. (Either that, or they'll be more overwhelmed
> than usual with bug reports.)
From your fingers to their eyes. :-)
I suspect that there just aren't that many programs running on Mac OS X 10.4
(or higher) systems that use fcntl() file-locking and mmap() memory-mapping as
much as the LDM.
> Unidata's "known bug" entry for this problem notes that there is no
> workaround. That's true in a sense, but if it were strictly true then
> I'd never be able to stop the LDM at all, even when processes hang
> (which eventually some of them do on a semi-regular basis).
I think a hung downstream LDM will, nevertheless, terminate upon reception of a
SIGTERM, which is what the top-level LDM server sends all child processes when
it's told to terminate. I could be wrong, however. One thing I have noticed
is that attaching to the hung process with gdb(1) and then exiting gdb(1) will
free the process from its hung state. I'm at a loss to understand how that
happens without intervention by the operating system.
> I've
> written scripts that try to deal with the inability to run "ldmadmin
> stop" to stop the LDM; maybe you could comment on whether or not I've
> got the bases covered acceptably:
>
> (1) Run "ldmadmin stop", redirecting the output to a file.
>
> (2) Check that file for the word "isn't" (as in "the LDM isnt
> running", or something like that).
>
> (3a) If the LDM isn't running, check to see if there's a ldmd.pid
> file.
> -- If there's no ldmd.pid file, run pqcat & pqcheck, then run
> "ldmadmin clean".
>
> (3b) If the LDM is running, wait 30 seconds to give it a chance to
> shut down. (This is typically doomed, at least for some rpc processes.)
Maybe you should give it a minute.
> (4) Get a list of pids for rpc processes owned by the ldm account.
>
> (5) If this list isn't empty:
> -- Run "kill -9" on them all.
> -- Run "ldmadmin clean".
> -- Run pqcat and pqcheck. If this doesn't produce a "tallied
> consistent" message, run "ldmadmin delqueue" and "ldmadmin mkqueue".
>
> (6) Run "ldmadmin start".
This procedure should result in a restarted LDM. I just wish it wasn't
necessary.
> (Script attached.)
>
> -- Dave
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: USJ-914724
Department: Support LDM
Priority: Normal
Status: Closed