This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
David, >Date: Wed, 30 Jun 2004 10:08:00 -0700 >From: David Ovens <address@hidden> >Organization: University of Washington >To: Steve Emmerson <address@hidden> >Subject: Re: 20040630: 20040630: potential LDM/pqact problem on OSF/1 >Keywords: 200406241954.i5OJsCWb010248 LDM PIPE Perl The above message contained the following: > I am looking in ~ldm/logs/ldmd.log*, I hope these are the correct > files. They are. > Anyhow, these files seem to be written to by two machines > sunny and glacier. TWO machines. That's rather odd. > I am noticing the problems on glacier. Here are > the glacier entries surrounding a failure that occurred with a 1554 > radar file for today: > > file sizes (.1 is from PERL, .2 is from Bourne-shell): > -rw-r--r-- 1 ldm ldm 1088055 Jun 30 09:02 n0r_20040630_1554 > -rw-r--r-- 1 ldm ldm 262144 Jun 30 09:02 n0r_20040630_1554.1 > -rw-r--r-- 1 ldm ldm 1088055 Jun 30 09:02 n0r_20040630_1554.2 The following log entries don't seem to correspond to the above files because the log entries refer to time "1545" rather than the above time of "1554". The log entries do indicate a problem, however. > ldmd.log entries: > Jun 30 16:02:02 glacier pqact[523476]: pbuf_flush (4) write: Interrupted > system call > Jun 30 16:02:02 glacier pqact[523476]: pipe_dbufput: > -close/home/disk/ldm/local/bin/gini/zlib2gif.pl/home/glacier/ldm/nport/IMAGE/NHEM-COMP/24km/VIS/VIS_20040630_1545satz/ch1/GOES-12/VIS/200406301545/NHEM-COMP/24km > write error > Jun 30 16:02:03 glacier pqact[523476]: pbuf_flush (5) write: Interrupted > system call > Jun 30 16:02:03 glacier pqact[523476]: pipe_dbufput: > -close/home/disk/ldm/local/bin/gini/png2gif.pl/home/glacier/ldm/nport/RADAR/1km/n0r/n0r_20040630_1554 > write error > Jun 30 16:02:05 glacier pqact[523476]: pbuf_flush (4) write: Broken pipe > Jun 30 16:02:05 glacier pqact[523476]: pipe_dbufput: > -close/home/disk/ldm/local/bin/gini/zlib2gif.pl/home/glacier/ldm/nport/IMAGE/PR-NATIONAL/8km/IR/IR_20040630_1545satz/ch1/GOES-12/IR/200406301545/PR-NATIONAL/8km > write error > Jun 30 16:02:05 glacier pqact[523476]: pipe_prodput: trying again > Jun 30 16:02:05 glacier pqact[523476]: pbuf_flush (4) write: Broken pipe > Jun 30 16:02:05 glacier pqact[523476]: pipe_dbufput: > -close/home/disk/ldm/local/bin/gini/zlib2gif.pl/home/glacier/ldm/nport/IMAGE/PR-NATIONAL/8km/IR/IR_20040630_1545satz/ch1/GOES-12/IR/200406301545/PR-NATIONAL/8km > write error > Is that what you were looking for? Yup. Apparently, the pqact(1) process is receiving a signal while it's trying to write data to a pipe. The signal shouldn't be a SIGCONT, because that signal should be ignored at that time. The signal also shouldn't be a SIGALRM because that signal should cause a "pbuf_flush" log entry to be made. The signal also shouldn't be a SIGPIPE or SIGIO because they should cause a different system error message to be logged. I wonder what the signal is and who's sending it. Can you put the pqact(1) process (pid 523476) into verbose logging mode by having the LDM user send it a SIGUSR2, e.g., kill -USR2 523476 Send me anything that looks relevant. Alternatively, can I log onto your system as the LDM user? That would be great. Sending the pqact(1) process another such signal will put it into debug logging mode, which will greatly increase the number of log messages and should be done with caution. A third such signal will put the pqact(1) process back into regular logging mode. Regards, Steve Emmerson