This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
On Fri, 1 Jun 2001, anne wrote: > "Arthur A. Person" wrote: > > > > > > > > When you say "it still thrashes", do you mean that products aren't being > > > received in a timely manner? Right now products on ldm.meteo appear to > > > be arriving pretty quickly. And, 'top' is showing a low load average, > > > the machine appears to be responsive, and there's a reasonable number of > > > rpc.ldmds... Is this all with your 600Mb queue? > > > > By thrashing, I mean that the disk I/O light is mostly on and occasionally > > blinks off and the system has very slow response and the IDD reception is > > lagging at the reclass time limit but a "top" shows only a few percent of > > cpu usage. The IDD seems fine on ldm right now because I restarted it > > last night and also remade the queue to 600MB. This doesn't tell us > > anything about the cause, but I'm beginning to suspect that it has > > something to do with using a large queue. I'm going to run it with the > > queue at 600MB until I leave for vacation next Friday... if it makes it > > that long without a problem, I'll conclude it's queue size related and we > > can resume working on this when we both get back from vacation. > > > > I still have my wsi data coming in, so if I don't see problems in the next > > week, I'll probably assume the wsi rpc's are a symptom rather than a > > cause, although they should still shut down when a connection is lost. > > > > Art, > > FYI, Charlie O'Brian at WSI agreed to feed our 7.1 machine temporarily > starting Monday. I'll request the WSI data then, and try it with > various queue sizes. Okay... that will be another test, although, I'm feeling like the wsi issue is more a symptom than a cause. > Also, he said: > > > Unless there is a problem (ie internet congestion, system crash, > > client LDM stopping, etc), out program should never have to reconnect. > > Our processes check every 5 minutes to make sure the client is > > connected. I noticed that we did a lot of restarting thru 5z this > > morning. I would hazzard to guess they are fine, now. > > Yesterday, from the piece of the log I ftp'ed from your site, there were > 155 connections in about 12 hours. (And only 106 disconnects, as I > recall.) Could connectivity be a factor? And yet, I'm assuming you had > no similar problems when you were using navier, is that right? I've been having on-and-off problems with wsi connectivity to navier from wsi, but I haven't pushed the issue because navier's been overloaded and I could never be sure what the real problem might be. There could be network delay's to wsi via ldm.meteo.psu.edu, but as I mentioned, my current thinking is that's not the primary problem. > You could try going back to the 2Gb queue and see if the problem > returns... I ran the 600MB queue over the weekend (since ~ last Thursday) and have seen no problems. I'm going to coast into my vacation period this way and when I get back, I will try the large queue again... I fully expect it to fail again as before, for whatever reason... we'll see. Interesting problem... Thanks for your help thus far... Art. > Anne > -- > *************************************************** > Anne Wilson UCAR Unidata Program > address@hidden P.O. Box 3000 > Boulder, CO 80307 > ---------------------------------------------------- > Unidata WWW server http://www.unidata.ucar.edu/ > **************************************************** > Arthur A. Person Research Assistant, System Administrator Penn State Department of Meteorology email: address@hidden, phone: 814-863-1563