[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Fwd: [Fwd: aeolus problems - LDM dying]]
- Subject: [Fwd: [Fwd: aeolus problems - LDM dying]]
- Date: Tue, 05 Feb 2002 20:24:25 -0700
anne wrote:
>
> anne wrote:
> >
> > Hi Russ and Mike,
> >
> > Larry Riddle's LDM, on aeolus, an OSF1 alpha, is having problems. It
> > keeps shutting down with same error message reported in the log:
> >
> > ldmd.log.1:Feb 05 22:51:59 aeolus motherlode[4249]: run_requester:
> > Starting Up: motherlode.ucar.edu
> > ldmd.log.1:Feb 05 22:59:29 aeolus motherlode[4249]: run_requester:
> > 20020205215159.112 TS_ENDT {{FSL2|UNIDATA, ".*"},{NNEXRAD,
> > ".*"},{DIFAX, ".*"}}
> > ldmd.log.1:Feb 05 22:59:30 aeolus motherlode[4249]:
> > FEEDME(motherlode.ucar.edu): OK
> > ldmd.log.1:Feb 05 22:59:31 aeolus motherlode[4249]: RECLASS:
> > 20020205215931.091 TS_ENDT {{FSL2|UNIDATA, ".*"},{NNEXRAD,
> > ".*"},{DIFAX, ".*"}}
> > ldmd.log.1:Feb 05 22:59:31 aeolus motherlode[4249]: skipped:
> > 20020205215159.267 (451.825 seconds)
> > ldmd.log.1:Feb 05 22:59:32 aeolus motherlode[4249]: assertion "n > 0"
> > failed: file "pq.c", line 2172
> > -----
> > ldmd.log.2:Feb 05 22:15:54 aeolus motherlode[3932]: run_requester:
> > Starting Up: motherlode.ucar.edu
> > ldmd.log.2:Feb 05 22:23:23 aeolus motherlode[3932]: run_requester:
> > 20020205211554.650 TS_ENDT {{FSL2|UNIDATA, ".*"},{NNEXRAD,
> > ".*"},{DIFAX, ".*"}}
> > ldmd.log.2:Feb 05 22:23:23 aeolus motherlode[3932]:
> > FEEDME(motherlode.ucar.edu): OK
> > ldmd.log.2:Feb 05 22:23:24 aeolus motherlode[3932]: RECLASS:
> > 20020205212324.304 TS_ENDT {{FSL2|UNIDATA, ".*"},{NNEXRAD,
> > ".*"},{DIFAX, ".*"}}
> > ldmd.log.2:Feb 05 22:23:24 aeolus motherlode[3932]: skipped:
> > 20020205211554.685 (449.618 seconds)
> > ldmd.log.2:Feb 05 22:23:25 aeolus motherlode[3932]: assertion "n > 0"
> > failed: file "pq.c", line 2172
> > -----
> > ldmd.log.3:Feb 05 17:00:29 aeolus motherlode[1329]: run_requester:
> > Starting Up: motherlode.ucar.edu
> > ldmd.log.3:Feb 05 17:00:29 aeolus motherlode[1329]: run_requester:
> > 20020205160029.865 TS_ENDT {{FSL2|UNIDATA, ".*"},{NNEXRAD,
> > ".*"},{DIFAX, ".*"}}
> > ldmd.log.3:Feb 05 17:00:30 aeolus motherlode[1329]:
> > FEEDME(motherlode.ucar.edu): OK
> > ldmd.log.3:Feb 05 17:41:04 aeolus motherlode[1329]: RECLASS:
> > 20020205164104.746 TS_ENDT {{FSL2|UNIDATA, ".*"},{NNEXRAD,
> > ".*"},{DIFAX, ".*"}}
> > ldmd.log.3:Feb 05 17:41:04 aeolus motherlode[1329]: skipped:
> > 20020205160304.032 (2280.714 seconds)
> > ldmd.log.3:Feb 05 18:03:47 aeolus motherlode[1329]: RECLASS:
> > 20020205170346.979 TS_ENDT {{FSL2|UNIDATA, ".*"},{NNEXRAD,
> > ".*"},{DIFAX, ".*"}}
> > ldmd.log.3:Feb 05 18:03:47 aeolus motherlode[1329]: skipped:
> > 20020205164524.036 (1102.943 seconds)
> > ldmd.log.3:Feb 05 20:59:38 aeolus motherlode[1329]: assertion "n > 0"
> > failed: file "pq.c", line 2172
> >
> > The function that is failing is this:
> > /*
> > * Hash function for signature.
> > */
> > static size_t
> > sx_hash(size_t nchains, const signaturet sig)
> > {
> > size_t h;
> > int i;
> > unsigned int n;
> >
> > n = 0;
> > for(i=0; i<4; i++)
> > n = 256*n + sig[i];
> > assert(n > 0);
> > h = n % nchains;
> > return h;
> > }
> >
> > Perhaps the signatures are being corrupted?
> >
> > It's interesting that the latencies on these skipped products are
> > terrible. ldmpings from motherlode to aeolus aren't very good,
> > including some in the hundreds of milliseconds:
> >
> > motherlode.ucar.edu% ldmping -i2 aeolus.ucsd.edu
> > Feb 06 01:08:44 State Elapsed Port Remote_Host
> > rpc_stat
> > ... (aeolus LDM started here)
> > Feb 06 01:09:40 RESPONDING 0.092502 388 aeolus.ucsd.edu
> > Feb 06 01:09:42 RESPONDING 0.065875 388 aeolus.ucsd.edu
> > Feb 06 01:09:44 RESPONDING 0.038995 388 aeolus.ucsd.edu
> > Feb 06 01:09:46 RESPONDING 0.039381 388 aeolus.ucsd.edu
> > Feb 06 01:09:48 RESPONDING 0.038904 388 aeolus.ucsd.edu
> > Feb 06 01:09:51 RESPONDING 0.039140 388 aeolus.ucsd.edu
> > Feb 06 01:09:53 RESPONDING 0.047059 388 aeolus.ucsd.edu
> > Feb 06 01:09:55 RESPONDING 0.039036 388 aeolus.ucsd.edu
> > Feb 06 01:09:57 RESPONDING 0.039950 388 aeolus.ucsd.edu
> > Feb 06 01:09:59 RESPONDING 0.040719 388 aeolus.ucsd.edu
> > Feb 06 01:10:01 RESPONDING 0.104465 388 aeolus.ucsd.edu
> > Feb 06 01:10:03 RESPONDING 0.050099 388 aeolus.ucsd.edu
> > Feb 06 01:10:05 RESPONDING 0.118380 388 aeolus.ucsd.edu
> > Feb 06 01:10:07 RESPONDING 0.039413 388 aeolus.ucsd.edu
> > Feb 06 01:10:09 RESPONDING 0.050446 388 aeolus.ucsd.edu
> > Feb 06 01:10:11 RESPONDING 0.044901 388 aeolus.ucsd.edu
> > Feb 06 01:10:13 RESPONDING 0.041743 388 aeolus.ucsd.edu
> > Feb 06 01:10:15 RESPONDING 0.039329 388 aeolus.ucsd.edu
> > Feb 06 01:10:17 RESPONDING 0.044745 388 aeolus.ucsd.edu
> > Feb 06 01:10:19 RESPONDING 0.040108 388 aeolus.ucsd.edu
> > Feb 06 01:10:21 RESPONDING 0.050392 388 aeolus.ucsd.edu
> > Feb 06 01:10:23 RESPONDING 0.040905 388 aeolus.ucsd.edu
> > Feb 06 01:10:25 RESPONDING 0.039391 388 aeolus.ucsd.edu
> > Feb 06 01:10:27 RESPONDING 0.058450 388 aeolus.ucsd.edu
> >
> > The queue seems ok:
> >
> > aeolus.ucsd.edu> pqmon -i2
> > Feb 06 01:19:37 pqmon: Starting Up (5892)
> > Feb 06 01:19:37 pqmon: nprods nfree nempty nbytes maxprods
> > maxfree minempty maxext age
> > Feb 06 01:19:37 pqmon: 108327 1 74777 749998248 158714
> > 12 24390 3928 20276
> > Feb 06 01:19:39 pqmon: 108321 1 74783 749993144 158714
> > 12 24390 9032 20271
> > Feb 06 01:19:41 pqmon: 108318 1 74786 750001632 158714
> > 12 24390 544 20267
> > Feb 06 01:19:43 pqmon: 108321 1 74783 749998984 158714
> > 12 24390 3192 20267
> > Feb 06 01:19:45 pqmon: 108328 1 74776 749999056 158714
> > 12 24390 3120 20265
> > Feb 06 01:19:47 pqmon: 108334 1 74770 749997760 158714
> > 12 24390 4416 20266
> > Feb 06 01:19:49 pqmon: 108360 1 74744 750001048 158714
> > 12 24390 1128 20265
> > Feb 06 01:19:51 pqmon: 108372 1 74732 749995496 158714
> > 12 24390 6680 20265
> > Feb 06 01:19:53 pqmon: 108383 1 74721 749997800 158714
> > 12 24390 4376 20262
> > Feb 06 01:19:55 pqmon: 108415 1 74689 749996816 158714
> > 12 24390 5360 20263
> > Feb 06 01:19:55 pqmon: Interrupt
> > Feb 06 01:19:55 pqmon: Exiting
> >
> > I do see some messages in the system log that make me suspicious - these
> > are for Mike:
> >
> > Feb 4 11:40:03 aeolus vmunix: RFS3_WRITE, client address =
> > 132.239.94.91, errno 22
> > Feb 5 07:56:31 aeolus vmunix: panic (cpu 0): vm_page_activate: already
> > active
> > Feb 5 07:56:31 aeolus vmunix: syncing disks... 237 122 30 done
> > Feb 5 07:56:31 aeolus vmunix: DUMP.prom: dev SCSI 0 6 0 0 300 0
> > FLAMG-IO, block 722079
> > Feb 5 07:56:31 aeolus vmunix: DUMP.prom: dev SCSI 0 6 0 0 300 0
> > FLAMG-IO, block 722079
> > Feb 5 07:56:31 aeolus vmunix: Alpha boot: available memory from
> > 0xbc4000 to 0xe000000
> > Feb 5 07:56:31 aeolus vmunix: Compaq Tru64 UNIX V5.0A (Rev. 1094); Thu
> > Nov 29 07:51:09 PST 2001
> > ...
> > Feb 5 07:57:58 aeolus vmunix: fta0: Link Unavailable.
> > Feb 5 07:58:51 aeolus vmunix: Mouse/Tablet has failed to reset.
> > Feb 5 07:59:19 aeolus last message repeated 2 times
> > Feb 5 08:59:16 aeolus vmunix: Memory error corrected by system
> > Feb 5 08:59:16 aeolus vmunix: biu_stat = 0000000000000240
> > Feb 5 08:59:16 aeolus vmunix: biu_addr = 00000001d4000018
> > Feb 5 08:59:16 aeolus vmunix: dc_stat = 0000000000000007
> > Feb 5 08:59:16 aeolus vmunix: fill_syndrome = 0000000000000000
> > Feb 5 08:59:16 aeolus vmunix: fill_addr = 0000000000065350
> > Feb 5 08:59:16 aeolus vmunix: bc_tag = 003c090000005428
> > Feb 5 08:59:16 aeolus vmunix: ident = 0
> >
> > Do you have any ideas about this?
> >
> > My next step will be to rebuild the queue. I'll save the old queue just
> > in case it might be useful.
> >
> > Anne
>
> --
> ***************************************************
> Anne Wilson UCAR Unidata Program
> address@hidden P.O. Box 3000
> Boulder, CO 80307
> ----------------------------------------------------
> Unidata WWW server http://www.unidata.ucar.edu/
> ****************************************************
--
***************************************************
Anne Wilson UCAR Unidata Program
address@hidden P.O. Box 3000
Boulder, CO 80307
----------------------------------------------------
Unidata WWW server http://www.unidata.ucar.edu/
****************************************************