[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20001214: LDM: out of per user processes

This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.

Subject: Re: 20001214: LDM: out of per user processes
Date: Wed, 03 Jan 2001 09:57:57 -0700

>Date:    Wed, 20 Dec 2000 12:35:25 -0500
>From:    Tom McDermott <address@hidden>
>To:      Russ Rew <address@hidden>
>Subject: Re: 20001214: LDM: out of per user processes

Tom,

In regards to a problem you reported with too many processes on an LDM
host, we recently had a similar occurrence of too many LDM processes.
The circumstances were a power outage followed by the LDM being
automatically restarted without remaking the product queue on reboot
when the power came back on.  The power outage may have resulted in a
corrupted queue, and we now have a theory for how a corrupt queue can
cause spawning extra processes.

After the LDM is restarted with a corrupted queue, a downstream site
requests a feed.  The LDM spawns a sender process to provide the feed,
and the sender process starts an hour back in the queue to send any
missed products.  The sender process gets stuck in a loop accessing
products by insertion time in the corrupted queue, perhaps because the
"next product by insertion time" function returns a previous product.
The downstream site notices it isn't getting any response to its
FEEDME request so sends another FEEDME request.  The LDM assumes the
unresponsive sender process has died, so starts up another one.

It may be more complicated than this, with a different process getting
a lock on a region of the queue and never giving the lock up due to
looping, deadlocking other processes including a sender process.

If this is what is really happening, a possible fix would involve
making each queue scanner process notice when it isn't making any
progress through the queue and returning an error indication before
exiting.  This may be a difficult bug to reproduce, because we need to
get a queue in an inconsistent state that will cause another process
to loop accessing its products, but we'll see if we can reproduce it.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu

Prev by Date: Re: 20000102: ldm-5.1.2
Next by Date: Re: 20000102: ldm-5.1.2
Previous by thread: Re: 20001214: LDM: out of per user processes
Next by thread: LDM V5.1.3 information
Index(es):
- Date
- Thread