This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>From: "Anderson, Alan C. " <address@hidden> >Organization: St. Cloud State >Keywords: 200303181640.h2IGeXB2004042 LDM-6 McIDAS-XCD scour Alan, >Noticed that our ldm has stopped getting data from papagayo >as of about 10Z on 17 Mar. My log files seemed ok up to that >time, then data log stopped. I have checked with Clint, see >his response below. >Any suggestions. OK. The messages in Clint's log file confirm/demonstrate the inability of his LDM to send you data. >Have stopped and restarted my ldm this morning, but it is still >not ingesting. I logged on and was able to run notifyme to papagayo to verify that nothing has changed on Clint's side (allows, etc.): <as 'ldm'> notifyme -vxl- -f ANY -o 3600 -h papagayo.unl.edu Data lists came back immediately proving that Clint's machine is correctly setup to allow feeds from waldo. I then ran top and noticed that the load average on waldo was 44. Since this is extremely unusual, I decided to shutdown the LDM and run some checks on the queue. /usr/local/ldm% pqcat -s -q data/ldm.pq -l- Mar 18 18:28:36 pqcat: Starting Up (9152) Mar 18 18:28:36 pqcat: assertion "IsAlloc(rep)" failed: file "pq.c", line 1907 Abort (core dumped) This looked as though the queue was corrupted, so I decided to try and delete and remake it: /usr/local/ldm% ldmadmin delqueue /usr/local/ldm/data/ldm.pq: No such file or directory After verifying that there was still a link between /var/data/ldm and /usr/local/ldm/data, I looked for a queue: /usr/local/ldm% cd data /usr/local/ldm/data% ls -alt total 22 drwxr-xr-x 5 ldm data 512 Mar 18 18:28 ./ drwxr-xr-x 2 ldm data 6656 Mar 18 16:08 logs/ drwxrwxr-x 4 ldm data 512 Nov 6 21:01 gempak/ drwxrwxr-x 3 ldm data 512 Sep 25 01:00 surface/ drwxrwxr-x 4 ldm data 512 Nov 24 1999 ../ So, your problem was that your LDM queue somehow got deleted! I remade the queue and then restarted your LDM: /usr/local/ldm% ldmadmin mkqueue /usr/local/ldm% ldmadmin start Data is once again flowing into waldo. Now, the question is how the LDM queue got deleted!? While I was on waldo, I decided to move the scouring of McIDAS-XCD produced data files to the 'ldm' account: <as 'ldm'> cd util <- ~ldm/util is in the PATH for 'ldm' cp ~mcidas/workdata/mcscour.sh . <I looked at the contents of mcscour.sh to make sure that all the environment variables are set correctly, and they are> I changed the mcscour.sh logging from /home/mcidas/workdata/scour.log to ~ldm/logs/mcscour.log. This puts almost all of your LDM related log files into ~ldm/logs. The only one that I didn't move/change was /home/mcidas/workdata/ROUTEPP.LOG. This can easily be moved by editing the MCLOG setting in ~ldm/decoders/batch.k. Next, I moved McIDAS ADDE server logging from ~mcidas/workdata to ~ldm/logs. This required that I: o setup a McIDAS REDIRECTion for SERVER.* in the 'mcidas' account o change the permissions on /var/data/ldm/logs so that it was group writable (mcidas and mcadde are in the same group as ldm) o move ~mcidas/workdata/SERVER.LOG to ~ldm/logs and change its permission to be writable by mcadde o add a cron entry to 'ldm's crontab to rotate the SERVER.LOG* files Then, since the dostats action is commented out in 'ldm's crontab file, I edited ~ldm/etc/ldmd.conf to stop pqbinstats from running. This prevents the .stats files from being created in ~ldm/logs. This is necessary since the bin/ldmadmin dostats action normally run from cron is what scours the .stats files. The last thing I did was run ~ldm/util/mcscour.sh "by hand" as 'ldm' to make sure that it worked. It apparently does since the March 16 .XCD file in /var/data/mcidas and its associated .IDX files were scoured off. This leaves that file system with about 3.5 GB of space: % df -k Filesystem kbytes used avail capacity Mounted on /proc 0 0 0 0% /proc /dev/dsk/c0d0s0 7396768 3681199 3641602 51% / fd 0 0 0 0% /dev/fd swap 802576 312 802264 1% /tmp Recap: - the LDM was not receiving data since something had deleted the LDM queue even though the LDM was till running. I remade the queue and restarted the LDM. Data is being received and processed normally once again - I moved the XCD scouring to an 'ldm' cron job and move the log file to ~ldm/logs/mcscour.log - I move the McIDAS ADDE remote server logging to ~ldm/logs and setup a cron entry to rotate the log files - I stopped pqbinstats from being run at LDM startup We need to keep an eye on the McIDAS-XCD scouring done by mcscour.sh to make sure that it continues to work. Please let me know if you see anything amiss on waldo. Tom >-----Original Message----- >From: Clint Rowe [mailto:address@hidden] >Sent: Tuesday, March 18, 2003 10:33 AM >To: Anderson, Alan C. >Subject: Re: ldm at papagayo > > >Alan, >I seem to have all the data and papagayo's been chugging along without any >problems. There are some errors regarding waldo in yesterday's log file: > >Mar 17 10:10:08 papagayo waldo(feed)[4767]: up6.c:168: HEREIS: RPC: Unable to >send; errno = Broken pipe >Mar 17 10:10:08 papagayo waldo(feed)[4767]: up6.c:369: Product send failure: I > /O >error >Mar 17 10:10:16 papagayo rpc.ldmd[28230]: child 4767 exited with status 6 > >... > >Mar 17 10:21:58 papagayo waldo(feed)[28849]: up6.c:168: HEREIS: RPC: Unable to > >send; errno = Broken pipe >Mar 17 10:21:58 papagayo waldo(feed)[28849]: up6.c:369: Product send failure: >I/O error >Mar 17 10:22:06 papagayo rpc.ldmd[28230]: child 28849 exited with status 6 > >... > >Mar 17 10:35:22 papagayo waldo(feed)[28847]: up6.c:168: HEREIS: RPC: Unable to > >send; errno = Broken pipe >Mar 17 10:35:22 papagayo waldo(feed)[28847]: up6.c:369: Product send failure: >I/O error >Mar 17 10:35:30 papagayo rpc.ldmd[28230]: child 28847 exited with status 6 > >I think the problem is at your end, as I'm getting data and nobody else has >complained. > >Let me know if you can't get restarted. >Clint > > >> >>Hi Clint >> >>We stopped getting data from papagayo yesterday, Mar. 17 at about 10Z >> >>Is there a problem at unl ? >> >>Alan Anderson >>St. Cloud State