[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20050214: LDM product queue corruption
- Subject: 20050214: LDM product queue corruption
- Date: Mon, 14 Feb 2005 10:38:20 -0700
Gabriel,
I'm investigating your problem and need your help.
The logfile that you attached contained a lot of distracting entries, so
I applied the following filter to it:
egrep -v 'to LDM 6 on striker|striker\[8872\]: Desired|eldm\[8873\]:
Desired|pnga2area|nullproc_6 failure to eldm' log.bak
because
1. Striker isn't allowing your LDM to connect.
2. The LDM on host Eldm has problems.
3. The log message from pnga2area(1) are irrelevant.
This resulted in the following:
Jan 31 14:43:17 twister flood[8875]: Starting Up(6.1.0):
flood.atmos.uiuc.edu: TS_ZERO TS_ENDT {{FNEXRAD|FSL2|UNIWISC, ".*"}}
Jan 31 14:43:17 twister flood[8876]: Starting Up(6.1.0):
flood.atmos.uiuc.edu: TS_ZERO TS_ENDT {{IDS|DDPLUS, ".*"}}
Jan 31 14:43:17 twister flood[8877]: Starting Up(6.1.0):
flood.atmos.uiuc.edu: TS_ZERO TS_ENDT {{NNEXRAD, "^SDUS51 .* /p(NOR|NOV)"}
}
Jan 31 14:43:17 twister papagayo[8878]: Starting Up(6.1.0):
papagayo.unl.edu: TS_ZERO TS_ENDT {{HDS, ".*"}}
Jan 31 14:43:17 twister papagayo[8879]: Starting Up(6.1.0):
papagayo.unl.edu: TS_ZERO TS_ENDT {{FNEXRAD|FSL2|UNIWISC, ".*"}}
Jan 31 14:43:17 twister papagayo[8880]: Starting Up(6.1.0):
papagayo.unl.edu: TS_ZERO TS_ENDT {{IDS|DDPLUS, ".*"}}
Jan 31 14:43:17 twister papagayo[8881]: Starting Up(6.1.0):
papagayo.unl.edu: TS_ZERO TS_ENDT {{NNEXRAD, "^SDUS51 .* /p(NOR|NOV)"}}
Jan 31 14:43:17 twister pluto[8883]: Starting Up(6.1.0): pluto.met.fsu.edu:
TS_ZERO TS_ENDT {{UNIWISC, "Q[01] (C[^GH]|U[^78DE])"}}
Jan 31 14:43:17 twister flood[8884]: Starting Up(6.1.0):
flood.atmos.uiuc.edu: TS_ZERO TS_ENDT {{UNIWISC, "Q[01] (C[^GH]|U[^78DE])"
}}
Jan 31 14:43:17 twister flood[8885]: Starting Up(6.1.0):
flood.atmos.uiuc.edu: TS_ZERO TS_ENDT {{NNEXRAD, "/p... (ILN|CLE|IND)"}}
Jan 31 14:43:17 twister rtstats[8870]: Starting Up (8867)
Jan 31 14:43:17 twister papagayo[8880]: Desired product class:
20050131134317.817 TS_ENDT {{IDS|DDPLUS, ".*"}}
Jan 31 14:43:17 twister flood[8885]: Desired product class:
20050131134317.817 TS_ENDT {{NNEXRAD, "/p... (ILN|CLE|IND)"}}
Jan 31 14:43:17 twister pluto[8883]: Desired product class:
20050131134317.817 TS_ENDT {{UNIWISC, "Q[01] (C[^GH]|U[^78DE])"}}
Jan 31 14:43:17 twister papagayo[8881]: Desired product class:
20050131134317.817 TS_ENDT {{NNEXRAD, "^SDUS51 .* /p(NOR|NOV)"}}
Jan 31 14:43:17 twister flood[8884]: Desired product class:
20050131134317.817 TS_ENDT {{UNIWISC, "Q[01] (C[^GH]|U[^78DE])"}}
Jan 31 14:43:17 twister flood[8875]: Desired product class:
20050131134317.817 TS_ENDT {{FNEXRAD|FSL2|UNIWISC, ".*"}}
Jan 31 14:43:17 twister flood[8876]: Desired product class:
20050131134317.817 TS_ENDT {{IDS|DDPLUS, ".*"}}
Jan 31 14:43:17 twister flood[8877]: Desired product class:
20050131134317.817 TS_ENDT {{NNEXRAD, "^SDUS51 .* /p(NOR|NOV)"}}
Jan 31 14:43:17 twister papagayo[8878]: Desired product class:
20050131134317.817 TS_ENDT {{HDS, ".*"}}
Jan 31 14:43:17 twister papagayo[8879]: Desired product class:
20050131134317.817 TS_ENDT {{FNEXRAD|FSL2|UNIWISC, ".*"}}
Jan 31 14:43:17 twister stokes[8874]: Desired product class:
20050131134317.817 TS_ENDT {{HDS, ".*"}}
Jan 31 14:43:17 twister papagayo[8880]: Connected to upstream LDM-6
Jan 31 14:43:17 twister flood[8885]: Connected to upstream LDM-6
Jan 31 14:43:17 twister papagayo[8881]: Connected to upstream LDM-6
Jan 31 14:43:18 twister flood[8875]: Connected to upstream LDM-6
Jan 31 14:43:18 twister flood[8877]: Connected to upstream LDM-6
Jan 31 14:43:18 twister flood[8876]: Connected to upstream LDM-6
Jan 31 14:43:18 twister papagayo[8878]: Connected to upstream LDM-6
Jan 31 14:43:18 twister papagayo[8879]: Connected to upstream LDM-6
Jan 31 14:43:18 twister stokes[8874]: Connected to upstream LDM-6
Jan 31 14:43:18 twister papagayo[8880]: Upstream LDM is willing to feed
Jan 31 14:43:18 twister flood[8884]: Connected to upstream LDM-6
Jan 31 14:43:18 twister papagayo[8881]: Upstream LDM is willing to feed
Jan 31 14:43:18 twister papagayo[8878]: Upstream LDM is willing to feed
Jan 31 14:43:18 twister papagayo[8879]: Upstream LDM is willing to feed
Jan 31 14:43:18 twister flood[8885]: Upstream LDM is willing to feed
Jan 31 14:43:18 twister flood[8884]: Upstream LDM is willing to feed
Jan 31 14:43:18 twister flood[8876]: Upstream LDM is willing to feed
Jan 31 14:43:18 twister flood[8877]: Upstream LDM is willing to feed
Jan 31 14:43:18 twister flood[8875]: Product reclassification by upstream
LDM: 20050131134317.817 TS_ENDT {{FNEXRAD|FSL2|UNIWISC, "
.*"}} -> 20050131134317.817 TS_ENDT {{FNEXRAD|UNIWISC, ".*"}}
Jan 31 14:43:18 twister stokes[8874]: Upstream LDM is willing to feed
Jan 31 14:43:19 twister flood[8875]: Upstream LDM is willing to feed
Jan 31 14:43:22 twister pluto[8883]: Connected to upstream LDM-6
Jan 31 14:43:22 twister pluto[8883]: Upstream LDM is willing to feed
Jan 31 20:45:34 twister ldmping[8756]: SVC_UNAVAIL 0.000639 0
localhost RPC: Program not registered
Jan 31 20:45:34 twister ldmping[8836]: SVC_UNAVAIL 0.000626 0
localhost RPC: Program not registered
Jan 31 20:45:34 twister pqcheck[8840]: Starting Up (6554)
Jan 31 20:45:34 twister pqcheck[8840]: The writer-counter of the
product-queue is 10
Jan 31 20:45:34 twister pqcheck[8840]: Exiting
Feb 01 01:59:50 twister ldmping[8761]: SVC_UNAVAIL 0.000669 0
localhost RPC: Program not registered
Feb 01 01:59:50 twister ldmping[8841]: SVC_UNAVAIL 0.000624 0
localhost RPC: Program not registered
Feb 01 01:59:50 twister pqcheck[8845]: Starting Up (6835)
Feb 01 01:59:50 twister pqcheck[8845]: The writer-counter of the
product-queue is 10
Feb 01 01:59:50 twister pqcheck[8845]: Exiting
Feb 01 02:28:01 twister ldmping[18154]: SVC_UNAVAIL 0.000665 0
localhost RPC: Program not registered
Feb 01 02:28:06 twister ldmping[18160]: SVC_UNAVAIL 0.000718 0
localhost RPC: Program not registered
Feb 01 02:28:06 twister pqcheck[18164]: Starting Up (18155)
Feb 01 02:28:06 twister pqcheck[18164]: The writer-counter of the
product-queue is 10
Feb 01 02:28:06 twister pqcheck[18164]: Exiting
Feb 01 02:28:15 twister ldmping[18175]: SVC_UNAVAIL 0.000690 0
localhost RPC: Program not registered
Feb 01 02:28:15 twister pqcheck[18179]: Starting Up (18170)
Feb 01 02:28:15 twister pqcheck[18179]: The writer-counter of the
product-queue is 10
Feb 01 02:28:15 twister pqcheck[18179]: Exiting
Feb 01 02:28:28 twister ldmping[18185]: SVC_UNAVAIL 0.000723 0
localhost RPC: Program not registered
Feb 01 02:28:30 twister ldmping[18193]: SVC_UNAVAIL 0.000696 0
localhost RPC: Program not registered
Feb 01 02:28:54 twister ldmping[18200]: SVC_UNAVAIL 0.000656 0
localhost RPC: Program not registered
Feb 01 02:28:54 twister pqcheck[18204]: Starting Up (18195)
Feb 01 02:28:54 twister pqcheck[18204]: The writer-counter of the
product-queue is 0
Feb 01 02:28:54 twister pqcheck[18204]: Exiting
The interesting thing here is that from 2005-01-31T14:43:22 to
2005-01-31T20:45:34 UTC the LDM was, apparently, working OK. At that
later time, however, an ldmping(1) process was started and determined
that the LDM wasn't available. A pqcheck(1) process that was started at
the same time, however, indicated that the LDM product-queue had ten
writers into it.
Who started that ldmping(1) process and why? Was it a part of a larger
script (started by crontab(1) perhaps)? Could that script also do bad
things to the product-queue?
The logfile entry just prior to that ldmping(1) entry is
Jan 31 20:43:28 twister eldm[8873]: ERROR: requester6.c:455;
ldm_clnt.c:286: nullproc_6 failure to eldm.fsl.noaa.gov; ldm_clnt.c:142: RPC:
Unable to receive; errno = Connection reset by peer
which is 2 minutes 4 seconds earlier than the ldmping(1) entry. So the
LDM system appears to have been working up to that time -- with no
indication of a problem other than those unreachable hosts.
When the LDM system is running, can you ldmping(1) it?
Regards,
Steve Emmerson