This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>From: alan anderson <address@hidden> >Organization: St. Cloud State >Keywords: 199911232244.PAA17363 LDM Alan, >We are having a problem with our ldm machine. I checked it today >and found it in a state that looked to me like it had been shutdown >and rebooted; a ps indicated no ldm processes running. >The ldm does not start automatically upon a reboot. Does not start, or did not start? >I did an ldmadmin stop just to be sure, and then rebooted. System >came back up with no problems or messages. > >Tried to start the ldm, but the start did not confirm; I recalled that >queue often is corrupted, so deleted the queue, then a mkqueue, then >restarted ldm, which was confirmed. Did you make sure to become the user 'ldm' before trying to do restart the LDM? This is important: the LDM should never be run as 'root'. >Log files (excerpt below) show that something is still wrong. My shallow >memory about what this could be leaves me blank, so I am writing to you. No problem. >waldo is the place where ldm lives; I think you already know the rest >if you want to look around. Otherwise, could I have some instructions? I decided to login and do some snooping. More below. >Nov 23 22:27:58 waldo pqact[1514]: pbuf_flush (4) write: Broken pipe >Nov 23 22:27:58 waldo pqact[1514]: pipe_dbufput: xcd_runDDS write error >Nov 23 22:27:58 waldo pqact[1514]: pipe_prodput: trying again >Nov 23 22:27:58 waldo pqact[1514]: pbuf_flush (4) write: Broken pipe >Nov 23 22:27:58 waldo pqact[1514]: pipe_dbufput: xcd_runDDS write error >Nov 23 22:27:58 waldo pqact[1514]: child 5357 exited with status 127 >Nov 23 22:27:58 waldo pqact[1514]: child 5355 exited with status 127 >Nov 23 22:27:58 waldo pqact[1514]: pbuf_flush (4) write: Broken pipe >Nov 23 22:27:58 waldo pqact[1514]: pipe_dbufput: xcd_runDDS write error >Nov 23 22:27:58 waldo pqact[1514]: pipe_prodput: trying again >Nov 23 22:27:58 waldo pqact[1514]: pbuf_flush (4) write: Broken pipe >Nov 23 22:27:58 waldo pqact[1514]: pipe_dbufput: xcd_runDDS write error >Nov 23 22:27:58 waldo pqact[1514]: child 5361 exited with status 127 >Nov 23 22:27:58 waldo pqact[1514]: child 5359 exited with status 127 The repeated start and failure of 'xcd_run DDS' is telling us that the process that xcd_run is running (ingetext.k in this case) is exiting without reading from from the LDM. This is a big hint that the LDM was most likely not started by the user 'ldm' since things were working correctly before. I did a quick look around and found that you must have started the LDM as 'root' as a number of files were owned by root: /usr/local/ldm% ls -al total 10980 drwxr-xr-x 16 ldm data 1024 Nov 23 22:07 ./ -rw-rw-r-- 1 root other 5 Nov 23 22:07 ldmd.pid waldo# ls -al total 3854 drwxr-xr-x 2 ldm other 1024 Nov 23 22:48 . drwxr-xr-x 7 ldm data 512 Nov 23 22:06 .. -rw-rw-r-- 1 ldm data 446 Nov 21 19:03 1999112118.stats -rw-rw-r-- 1 ldm data 446 Nov 21 20:03 1999112119.stats -rw-rw-r-- 1 ldm data 446 Nov 21 21:04 1999112120.stats -rw-rw-r-- 1 ldm data 446 Nov 21 22:09 1999112121.stats -rw-rw-r-- 1 ldm data 446 Nov 21 23:03 1999112122.stats -rw-rw-r-- 1 ldm data 446 Nov 22 00:03 1999112123.stats -rw-rw-r-- 1 ldm data 446 Nov 22 01:03 1999112200.stats -rw-rw-r-- 1 ldm data 446 Nov 22 02:04 1999112201.stats -rw-rw-r-- 1 ldm data 446 Nov 22 03:05 1999112202.stats -rw-rw-r-- 1 ldm data 446 Nov 22 04:08 1999112203.stats -rw-rw-r-- 1 ldm data 110 Nov 22 04:45 1999112204.stats -rw-rw-r-- 1 ldm data 446 Nov 22 06:27 1999112205.stats -rw-rw-r-- 1 ldm data 446 Nov 22 07:03 1999112206.stats -rw-rw-r-- 1 ldm data 446 Nov 22 08:22 1999112207.stats -rw-rw-r-- 1 ldm data 446 Nov 22 09:03 1999112208.stats -rw-rw-r-- 1 ldm data 446 Nov 22 10:03 1999112209.stats -rw-rw-r-- 1 ldm data 446 Nov 22 11:03 1999112210.stats -rw-rw-r-- 1 ldm data 446 Nov 22 12:03 1999112211.stats -rw-rw-r-- 1 ldm data 446 Nov 22 13:03 1999112212.stats -rw-rw-r-- 1 ldm data 446 Nov 22 14:03 1999112213.stats -rw-rw-r-- 1 ldm data 446 Nov 22 15:59 1999112214.stats -rw-rw-r-- 1 ldm data 446 Nov 22 16:28 1999112215.stats -rw-rw-r-- 1 root other 449 Nov 23 21:59 1999112320.stats -rw-rw-r-- 1 root other 559 Nov 23 22:49 1999112321.stats -rw-rw-r-- 1 root other 559 Nov 23 23:07 1999112322.stats -rw-rw-r-- 1 ldm data 301 Apr 23 1999 f.log -rw-r--r-- 1 ldm data 1146 Nov 23 22:35 ldmbinstats.upc -rw-rw-r-- 1 root other 1587076 Nov 23 23:07 ldmd.log -rw-rw-r-- 1 root other 92333 Nov 23 21:59 ldmd.log.1 -rw-r--r-- 1 ldm data 3938 Nov 23 21:49 ldmd.log.2 -rw-r--r-- 1 ldm data 87387 Nov 22 16:27 ldmd.log.3 -rw-r--r-- 1 ldm data 142790 Nov 21 23:58 ldmd.log.4 -rw-r--r-- 1 ldm data 0 Mar 29 1999 ldmfail -rw-rw-r-- 1 ldm data 3591 Apr 23 1999 netcheck.log I corrected this by becoming 'root' and changing the ownership of all files owned by 'root' in the ~ldm directory tree. This included ~ldm/data/ldmd.pq, the LDM product queue: waldo# chown ldm * waldo# chgrp data * waldo# cd ~ldm/logs waldo# chown ldm * waldo# chgrp data * waldo# cd ~ldm/data waldo# chown ldm * waldo# chgrp data * Next, I tried starting the LDM as 'ldm', but I couldn't since the hidden LDM lock file in /tmp was still owned by 'root'. So, I became root again and stopped the LDM: su - <password> exec csh setenv PATH ~ldm/bin:$PATH ldmadmin stop exit After this I was back to being the user 'ldm'. For good measure, I did an 'ldmadmin stop' and then started the LDM: ldmadmin stop ldmadmin start ldmadmin tail /usr/local/ldm/logs% ldmadmin tail Nov 23 23:17:04 waldo chinook[17562]: run_requester: 19991123222238.441 TS_ENDT {{FSL2|MCIDAS|IDS|DDPLUS, ".*"}} Nov 23 23:17:04 waldo chinook[17562]: FEEDME(chinook.unl.edu): OK Nov 23 23:17:05 waldo udp.ldmd[17566]: Starting Up Nov 23 23:17:06 waldo localhost[17590]: Connection from localhost Nov 23 23:17:06 waldo localhost[17590]: Connection reset by peer Nov 23 23:17:06 waldo localhost[17590]: Exiting Nov 23 23:17:45 waldo proftomd[17596]: Starting up Nov 23 23:17:46 waldo proftomd[17596]: Making /var/data/mcidas/MDXX0097; may take some time... Nov 23 23:17:49 waldo proftomd[17596]: Decoding 1999327.2212 data into /var/data/mcidas/MDXX0097 Nov 23 23:17:49 waldo proftomd[17596]: Exiting Nov 23 23:21:00 waldo lwtoa3[17606]: PRODUCT CODE=UX 99327 223019 Nov 23 23:21:00 waldo lwtoa3[17606]: Done -- AREA= 109 Nov 23 23:21:06 waldo pqact[17558]: pbuf_flush (6) write: Broken pipe Nov 23 23:21:06 waldo pqact[17558]: pbuf_flush 6: time elapsed 5.351715 Nov 23 23:21:06 waldo pqact[17558]: pipe_dbufput: -closelwtoa3-d/var/data/mcidas write error Nov 23 23:21:06 waldo pqact[17558]: pipe_prodput: trying again Nov 23 23:21:06 waldo lwtoa3[17622]: PRODUCT CODE=UX 99327 223019 Nov 23 23:21:06 waldo lwtoa3[17622]: Done -- AREA= 100 Nov 23 23:21:10 waldo pqact[17558]: pbuf_flush (6) write: Broken pipe Nov 23 23:21:10 waldo pqact[17558]: pbuf_flush 6: time elapsed 4.002119 Nov 23 23:21:10 waldo pqact[17558]: pipe_dbufput: -closelwtoa3-d/var/data/mcidas write error Nov 23 23:22:04 waldo pqexpire[17555]: > Recycled 27588.838 kb/hr ( 3567.163 prods per hour) Nov 23 23:22:39 waldo lwtoa3[17642]: PRODUCT CODE=UA 99327 223134 Nov 23 23:22:41 waldo lwtoa3[17642]: Done -- AREA= 167 Nov 23 23:27:04 waldo pqexpire[17555]: > Recycled 19132.159 kb/hr ( 2987.324 prods per hour) The pbuf_flush (6) write: Broken pipe error seemed to be telling me that an lwtoa3 process was failing to write to an AREA file in /var/data/mcidas, but I looked there and all files are owned by 'ldm'. The success of AREA0167 further told me that things seemed to be working correctly, so I decided to let things run and see what happens. Please let me know if you see problems with ldm-mcidas or XCD data decoding. Tom >From address@hidden Wed Nov 24 13:05:53 1999 Hi Tom Just a short note to acknowledge your fix on waldo. I was not aware of the problems created by having root perform any system maintenance on the ldm. Our system seems to be working fine again. Thanks alan