[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20001218: write errors to mcidas directory after 17:52 today
- Subject: 20001218: write errors to mcidas directory after 17:52 today
- Date: Mon, 18 Dec 2000 18:48:54 -0700
>From: Robert Mullenax <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200012190130.eBJ1Uqo23841
Robert,
>Even though I have data flowing and the GEMPAK decoders are writing
>output, I am getting constant write error from the McIDAS
>XCD DDS and HRS decoders.
Where were you seening the errors?
>Data stopped being decoded at 17:52
>today (I know there was a McIADS feed problem)
Yes, but that was earlier, and the XCD decoders do not work with the
Unidata-Wisconsin image products; the ldm-mcidas decoders do.
>and now
>I have deleted the queue, made a new one, checked for permission
>problems, stopped and restarted the LDM, but no dice..still continuous
>write errors.
It would have been nice to see a sample of those errors.
>This is on the same disk that the GEMPAK data is being
>written to and there were no changes at all to the system.
Weird.
>It just stopped working. This is on our Sparc system
>psnldm.nsbf.nasa.gov.
>
>Help!!?
More below.
>From address@hidden Mon Dec 18 18:38:51 2000
>Okay after doing an ldm stop again ( third time) and an ldm clean
>it is working now. The question is what happened in the first place..
That was what I was going to ask.
>I saw this the other day on the x86 system in New Mexico. I remade the
>queue and that fixed it. The SPARC is running McIDAS-X 7.6/ldm-5.1.2/
>Solaris 7 and the x86 Solaris 8 with the same Unidata versions.
The XCD decoders should not care about the LDM queue. The sequence of
events is:
o LDM gets products from upstream sites
o pqact sends products to either ingebin.k or ingetext.k depending on
what kind of products we are talking about (binary/HRS or textual/DDS)
o ingebin.k and ingetext.k write the products they get from pqact to
a spool: ingetext.k to the daily .XCD file; ingebin.k to HRS.SPL
o the XCD data monitors work their way through the spool to decode
products into McIDAS files
The write error would have to come from ingetext.k and/or ingebin.k
having execute problems or not being able to write to their respective
spool files. Are you sure that no changes were made to the McIDAS
binaries during this process?
Tom
>From address@hidden Mon Dec 18 19:38:37 2000
Sorry, Tom I did not give you much to work on. I am in a slight
panic mode trying to get things ready for Australia..(this working
two jobs thing can get hectic). Here is what I saw this evening
in ldmd.log after I saw the errors in ldmd.log and stopped
and started the LDM after remaking the queue. Later on I
got HDS errors as well:
Dec 19 01:00:00 psnldm 140.172.240.73[3163]: run_requester:
20001218233000.988 T
S_ENDT {{HDS, ".*"}}
Dec 19 01:00:00 psnldm cirrus[3165]: run_requester: Starting Up:
cirrus.al.noaa.
gov
Dec 19 01:00:00 psnldm cirrus[3165]: run_requester: 20001218233000.999
TS_ENDT {
{FSL2|IDS|DDPLUS, ".*"},{MCIDAS, "^pnga2area Q[01]"
Dec 19 01:00:01 psnldm pqact[3159]: child 3164 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3161 exited with status 127
Dec 19 01:00:01 psnldm 140.172.240.73[3163]: FEEDME(140.172.240.73): OK
Dec 19 01:00:01 psnldm cirrus[3165]: FEEDME(cirrus.al.noaa.gov): OK
Dec 19 01:00:01 psnldm pqact[3159]: child 3167 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3169 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3171 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3173 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3175 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3177 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3180 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3183 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3185 exited with status 127
Dec 19 01:00:02 psnldm pqact[3159]: pbuf_flush (4) write: Broken pipe
Dec 19 01:00:02 psnldm pqact[3159]: pipe_dbufput: xcd_runDDS write error
Dec 19 01:00:02 psnldm pqact[3159]: pipe_prodput: trying again
Dec 19 01:00:02 psnldm pqact[3159]: child 3187 exited with status 127
Dec 19 01:00:02 psnldm pqact[3159]: child 3189 exited with status 127
Dec 19 01:00:02 psnldm pqact[3159]: child 3191 exited with status 127
Dec 19 01:00:02 psnldm pqact[3159]: child 3193 exited with status 127
Dec 19 01:00:02 psnldm pqact[3159]: child 3195 exited with status 127
--More--(0%)
Shortly after this I did an ldmadmin stop,clean, start and
all is well now (and continues to be fine). The strange
thing is that even though DDS is running (a ps -eaf shows
ingetext.k DDS and I have new obs) the XCD_START.LOG
in ~mcidas/workdata only shows the HRS starting. I am
sure no changes were made to the system.. It just started
spewing errors and then stopped after doing an ldmadmin clean
after having stopped and started a couple of times. I checked
the inge*.k binaries and thet were from April 28, 2000 and
have not been messed with. So I am really stumped.
Robert
>From address@hidden Mon Dec 18 20:21:40 2000
Tom,
I went over to wxmcidas which was fine and found it was doing the
same thing now, after I switched it's feed to other
than psnldm. I have found the problem. After doing ldmadmin
stop a couple of times and clean, I did an ldmadmin ps which
said no ldm running, etc.. However look at this:
all 1,042 messages.
/usr/local/ldm/logs% ps -lu ldm
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
8 S 1002 1671 1670 0 99 20 e17347d8 652 e1734844 ? 0:00
startxcd
8 S 1002 1501 1498 0 40 20 e172f7e0 87231 e172fa0c ? 0:03
pqbinsta
8 S 1002 1682 1671 0 40 20 e1b34140 652 e15b4bf6 ? 0:13
startxcd
8 S 1002 1673 1670 0 40 20 e17a47f8 87252 e17a4a24 ? 12:39
pqbinsta
8 S 1002 1687 1682 0 40 20 e17230a8 4669 e15a00f6 ? 782:51
dmgrid.k
8 R 1002 24678 14922 0 51 20 e19e6860 484 pts/2 0:00 tcsh
8 S 1002 18637 1682 0 40 20 e1b81158 920 e19aa6f6 ? 0:02
dmmisc.k
8 S 1002 17789 1682 0 40 20 e1eb4860 918 e126d676 ? 0:15
dmsfc.k
8 S 1002 18635 1682 0 41 20 e1b52840 902 e195fb96 ? 0:09
dmsyn.k
8 S 1002 22471 1682 0 40 20 e0dd7760 854 e19f93d6 ? 0:02
dmraob.k
I can't kill these off except by killing them one by one.
I have had trouble with ldm-5.1.2 getting it to stop, but have
not seen this before.
Robert