This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>To: address@hidden >From: Paul Hamer <address@hidden> >Subject: Slow Downstream Node Problem >Organization: NOAA/FSL >Keywords: 200105152257.f4FMvRp04890 Hi Paul, > We've been experiencing the "pq_del_oldest: conflict" message > problem and reading the web page you have describing it got > me thinking that there must be a better solution for the > slow or flaky network connection element. > > The net result of the slow downstream feed is that the incoming > data is delayed (possibly lost) due to the inablity of ldm > to make space in the product queue by deleting the oldest data. > It seems to me that the downstream side of things should > basically be told to disconnect and reconnect at the latest > (newest) end of your product queue. Better the for a customer > of your ldm to lose data than for you to, certainly for us anyway. I agree. I was surprised to discover that instead of jumping to the newest end of the product queue, the downstream client just jumps ahead one minute, so the problem quickly recurs. But currently I think the downstream node must determine it has fallen behind and jump ahead, rather than the upstream telling it to do this (is this right Anne?) In some cases the problem can be alleviated or eliminated by increasing the size of the product queue to hold significantly more than an hour's worth of data (or for whatever time period the downstream node is configured), since in that case the downstream node would not be locking the oldest product in the queue but an hour-old product somewhere in the middle of the queue. When it got the product, it would recognize it as too old, disconnect, and send a RECLASS message asking for newer products, but products could still be deleted to make room at the old end of the queue. But I think a better solution would be for the downstream node to jump to the new end of the queue (or maybe halfway there, since the pq library can access products quickly by time) instead of just a minute ahead. But this still leaves the possibility that the downstream node gets stuck and keeps a lock on an hour-old product until it really is the oldest product in the queue, causing the upstream node to lose data. > I started looking at the code and realised that this might not > be easy to do. After all how do you know which connection has > obtained the resource lock? Then I thought that maybe you don't > have to know. If you signal the "pq_del_oldest: conflict" to the > process group , i.e. Let everyone know you've seen EAGAIN, then > in handling the signal the process checks the following: > > 1. Does it have a lock? If not continue > 2. If so is the lock on the oldest queue member? If not continue > 3. If it is the oldest, free the resource, reset the pq cursor and > disconnect from the peer. > > I was thinking about trying to implement this but I don't have > anytime available, certainly not in the near future, so I was > wondering if you've been considering this? This seems like a good idea, and one we hadn't considered. The way we had planned to fix the problem instead was to just try to delete the next oldest product if there is a lock on the oldest product: http://www.unidata.ucar.edu/staff/russ/tmp/pq_del_oldest.html but I think your solution may be better. We'll discuss this and see if we can find the resources to try implementing your solution instead. --Russ _____________________________________________________________________ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu