This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Larry Riddle wrote: > > I don't know if it was the change to thelma or the fact that aeolus is not > running in debug ("Heisenberg") mode, but the ldm on aeolus.ucsd.edu has > been shutting itself down, two or three times a day, all weekend long. I > haven't touched any of the log files, there may be some useful info there. > > For the next time it shuts down, can someone tell me what needs to be done > to start it up again in debug mode? When the ghost of Heisenberg is > watching aeolus, we don't seem to have any trouble. > > Larry > > ---===---=-=-=-=-=-=-=-=-=-=-=====[\/]=====-=-=-=-=-=-=-=-=-=-=---===--- > -----===(* Climate's what we expect, but weather's what we > get. *)===----- > Larry Riddle : Climate Research Division : Scripps Institution of > Oceanography > University of California, San Diego : La Jolla, California 92093-0224 > Phone: (858) 534-1869 : Fax: (858) 534-8561 : E-Mail: address@hidden Hi Larry, I'm sorry to hear about these problems this weekend! I looked around aeolus and found no messages reported in the ldm logs. However, there were problems in the system log, /var/adm/messages. Here's the most recent: Apr 8 01:42:38 aeolus vmunix: trap: invalid memory write access from kernel mode Apr 8 01:42:38 aeolus vmunix: Apr 8 01:42:38 aeolus vmunix: faulting virtual address: 0x0000000000000018 Apr 8 01:42:38 aeolus vmunix: pc of faulting instruction: 0xfffffc00003e28e0 Apr 8 01:42:38 aeolus vmunix: ra contents at time of fault: 0xfffffc00003e2898 Apr 8 01:42:38 aeolus vmunix: sp contents at time of fault: 0xffffffff930bf900 Apr 8 01:42:38 aeolus vmunix: Apr 8 01:42:38 aeolus vmunix: panic (cpu 0): kernel memory fault Over the past four days there are several panic messages: # grep panic messages Jan 3 14:19:51 aeolus vmunix: panic (cpu 0): kernel memory fault Jan 7 09:39:45 aeolus vmunix: panic (cpu 0): kernel memory fault Jan 19 14:55:42 aeolus vmunix: panic (cpu 0): vm_page_activate: already active Feb 5 07:56:31 aeolus vmunix: panic (cpu 0): vm_page_activate: already active Mar 12 14:18:00 aeolus vmunix: panic (cpu 0): ialloc: dup alloc Apr 4 18:48:46 aeolus vmunix: panic (cpu 0): vm_page_activate: already active Apr 5 19:31:38 aeolus vmunix: panic (cpu 0): vm_page_activate: already active Apr 6 07:34:42 aeolus vmunix: panic (cpu 0): kernel memory fault Apr 7 20:34:16 aeolus vmunix: panic (cpu 0): vm_page_activate: already active Apr 8 01:42:38 aeolus vmunix: panic (cpu 0): kernel memory fault Mike says this indicates a memory problem, which might also explain the assertion errors you experienced earlier. Indeed, the last two 'panic' messages are each within 6 minutes of the last messages of an active ldm process that subsequently died. Mike advised that you remove the memory chips and reseat them to see if the problem goes away - it could just be a bad connection. If the problem reoccurs, apparently the next step is to reorder the chips and see if the problem stays in the same location or moves with the chips. If it stays in the same location the problem is in the slot, not the chip, although generally the problem is in the chip. I would show this to your system administrator. Because I don't think this is an ldm problem, I did not put the ldm in debug mode. If you would still like to know how to do this, let me know and I'll send it in a separate email. Anne -- *************************************************** Anne Wilson UCAR Unidata Program address@hidden P.O. Box 3000 Boulder, CO 80307 ---------------------------------------------------- Unidata WWW server http://www.unidata.ucar.edu/ ****************************************************