This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Russ Rew wrote: > > > After thinking about it more, it turns out if the signature (an array > of 4 ints) was all zeros, n would be zero and the assertion "n>0" would > be violated, even though n was an unsigned int. > > Although it's not supposed to be possible to get an all-zero signature > (it's the result of an MD5 digest of a product), it also seems likely > that a memory failure might be manifested as reading all zeros for a > memory fetch, or that a disk corruption might have the symptom of > zeroing out some bytes on the disk where signatures were stored. > So, this leaves us in an unresolved state. From the system logs we saw that aeolus had a CPU panic and rebooted itself at 07:56 local time. And, an hour later it corrected a memory error. But, the assertion violation errors reported in the ldm logs that caused the crashes occurred hours later. I also can't explain the bad latencies that were logged for only a few products: ldmd.log.3:Feb 05 17:41:04 aeolus motherlode[1329]: skipped: 20020205160304.032 (2280.714 seconds) ldmd.log.3:Feb 05 18:03:47 aeolus motherlode[1329]: skipped: 20020205164524.036 (1102.943 seconds) ldmd.log.2:Feb 05 22:23:24 aeolus motherlode[3932]: skipped: 20020205211554.685 (449.618 seconds) ldmd.log.1:Feb 05 22:59:31 aeolus motherlode[4249]: skipped: 20020205215159.267 (451.825 seconds) In two out of four crashes that I am aware of, these skipped products occurred immediately before the assertion failure. In a third crash two products were skipped well before the crash, and in the fourth crash there were no such skips. I guess the bad latencies are unrelated to crash, and must just reflect some problem in the connection during that 5+ hour time period. Although, it seems odd that just a few would have such bad latencies. So, we can't say for sure what went wrong. I suggest we watch aeolus for the rest of the day and if it behaves properly then send out a note to the effect that downstream sites could reconnect, although perhaps with a caveat... Anne -- *************************************************** Anne Wilson UCAR Unidata Program address@hidden P.O. Box 3000 Boulder, CO 80307 ---------------------------------------------------- Unidata WWW server http://www.unidata.ucar.edu/ ****************************************************