This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi all, The uptime.log is even wierder today. Look at this snippet: 12:06pm up 1 day(s), 21:17, 6 users, load average: 9.14, 10.77, 11.99 12:07pm up 1 day(s), 21:18, 6 users, load average: 7.82, 10.04, 11.64 12:08pm up 1 day(s), 21:19, 6 users, load average: 5.48, 8.91, 11.14 12:09pm up 1 day(s), 21:20, 6 users, load average: 4.11, 7.91, 10.64 12:10pm up 1 day(s), 21:21, 6 users, load average: 3.32, 7.02, 10.14 12:11pm up 1 day(s), 21:22, 6 users, load average: 4.17, 6.61, 9.80 12:12pm up 1 day(s), 21:23, 6 users, load average: 4.37, 6.23, 9.45 12:13pm up 1 day(s), 21:24, 6 users, load average: 4.88, 6.07, 9.20 12:14pm up 1 day(s), 21:25, 6 users, load average: 3.68, 5.58, 8.82 12:15pm up 1 day(s), 21:26, 6 users, load average: 4.66, 5.47, 8.57 12:16pm up 1 day(s), 21:27, 6 users, load average: 4.48, 5.29, 8.31 12:17pm up 1 day(s), 21:28, 6 users, load average: 3.37, 4.86, 7.96 12:18pm up 1 day(s), 21:29, 6 users, load average: 4.57, 4.86, 7.75 12:19pm up 1 day(s), 21:30, 6 users, load average: 5.41, 5.16, 7.68 12:20pm up 1 day(s), 21:31, 6 users, load average: 3.70, 4.70, 7.36 12:21pm up 1 day(s), 21:32, 6 users, load average: 3.75, 4.52, 7.12 12:22pm up 1 day(s), 21:33, 6 users, load average: 2.58, 4.08, 6.80 12:23pm up 1 day(s), 21:34, 6 users, load average: 12.65, 6.56, 7.50 12:24pm up 1 day(s), 21:35, 6 users, load average: 15.59, 8.51, 8.13 12:25pm up 1 day(s), 21:36, 6 users, load average: 17.77, 10.44, 8.84 12:26pm up 1 day(s), 21:37, 6 users, load average: 18.57, 11.98, 9.50 I can't correlate that 12:23 moment with anything in the LDM logs or the system logs. (/var/adm/messages is practically empty.) And, here's a traceroute from thelma to Penn State: /local/ldm% traceroute ldm.meteo.psu.edu traceroute: Warning: Multiple interfaces found; using 192.52.106.21 @ ge0 traceroute to ldm.meteo.psu.edu (128.118.28.12), 30 hops max, 40 byte packets 1 vbnsr-dmzfnet (192.52.106.10) 0.698 ms 0.690 ms 0.434 ms 2 mlra-n2 (128.117.2.253) 0.382 ms 0.375 ms 0.594 ms 3 gin-n243-72 (128.117.243.73) 0.849 ms 0.735 ms 0.565 ms 4 frgp-gw-1 (128.117.243.34) 1.543 ms 2.415 ms 1.700 ms 5 198.32.11.105 (198.32.11.105) 2.239 ms 1.709 ms 1.509 ms 6 kscy-dnvr.abilene.ucaid.edu (198.32.8.14) 12.183 ms 12.184 ms 12.815 ms 7 ipls-kscy.abilene.ucaid.edu (198.32.8.6) 22.066 ms 21.362 ms 21.394 ms 8 clev-ipls.abilene.ucaid.edu (198.32.8.26) 27.925 ms 27.939 ms 27.706 ms 9 abilene.psc.net (192.88.115.122) 31.138 ms 30.860 ms 31.129 ms 10 bar-beast.psc.net (192.88.115.17) 31.111 ms 30.987 ms 31.156 ms 11 psu-i2.psc.net (192.88.115.98) 57.862 ms 42.568 ms 73.063 ms 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * * 19 * * * 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * * 30 * * * From the LDM log, they're definately losing CONDUIT products. It probably would be helpful to get 5.2.1 in place on thelma, and to get rtstats from Harry and Art. I think I'll 5.2.1 it on milton this weekend and let it run a bit to try to ensure it's in a usable state. Anne Tom Yoksas wrote: > > >From: anne <address@hidden> > >Organization: UCAR/Unidata > >Keywords: 200209070333.g873XUj09291 > > Anne and Jeff, > > >While thelma looked pretty good about 6:30 today, with a load average > >around 5, now it's not looking so good. The load average was about 14, > >and it was sluggish in responding. > > Nuts. > > >There are only 71 rpc.ldmds at the moment, less than the 72 that I > >thought we were able to handle easily before the reboot. There are lots > >of reclasses to atm, plus some to sunset.aos.wisc.edu. > > >(What's 'aos'?). > > This appears to be f5.aos.wisc.edu. They are reporting realtime stats, > and their latencies don't look good. Seems to me that they should > be feeding from SSEC, no? > > >And connections are being dropped. > > So, when the load average goes above some level, data stops getting > delivered reliably and reclass messages ensue. > > >I started a cron job to run uptime every minute to track the load > >average. The resulting log is in ~logs/uptime.log. > > The contents of this file are very interesting. The load average comes > and goes. We now need to correlate that with CONDUIT data volume (or > anything else). > > It seems to me that we need to jump on getting 5.2.1 ready so we can > get both Washington and Penn State to upgrade to it and run rtstats. > This should help us understand what is happening at these sites. > > The overnight rtstats from atm and f5.aos are really interesting. > atm looks OK except for NNEXRAD, and f5 looks bad. I don't know > what to make of this! > > Tom > -- > +-----------------------------------------------------------------------------+ > * Tom Yoksas UCAR Unidata Program > * > * (303) 497-8642 (last resort) P.O. Box 3000 > * > * address@hidden Boulder, CO 80307 * > * Unidata WWW Service > http://www.unidata.ucar.edu/* > +-----------------------------------------------------------------------------+ -- *************************************************** Anne Wilson UCAR Unidata Program address@hidden P.O. Box 3000 Boulder, CO 80307 ---------------------------------------------------- Unidata WWW server http://www.unidata.ucar.edu/ ****************************************************