[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
"pbuf_flush: time elapsed" problem (was: Problem with LDM 6.3.0)
- Subject: "pbuf_flush: time elapsed" problem (was: Problem with LDM 6.3.0)
- Date: Thu, 15 Sep 2005 11:14:37 -0600
Justin,
>Date: Thu, 15 Sep 2005 08:22:48 -0400
>From: Justin Cooke <address@hidden>
>Organization: NOAA
>To: address@hidden
>Subject: Problem with LDM 6.3.0
The above message contained the following:
> We have recently installed version 6.3.0 of LDM and are seeing
> occasional errors with two of our PIPE processes. I have included an
> excerpt from the ldmd.log of one of the errors:
>
> ---
> Sep 14 21:42:17 b2n1 eldm4[1171480]: 452967 20050914214215.998 PCWS
> 000 FSL.CompressedNetCDF.MADIS.acars.20050914_2100.gz
> Sep 14 21:42:17 b2n1 pqact[1511588]: 452967 20050914214215.998 PCWS
> 000 FSL.CompressedNetCDF.MADIS.acars.20050914_2100.gz
> Sep 14 21:42:17 b2n1 pqact[1511588]: pipe: -close
> /home/decdev/bin/run_dctamd.sh
> /dcomdev/us007003/ldmdata/obs/upperair/tamdar 20050914_2100.gz
> Sep 14 21:44:17 b2n1 pqact[1511588]: pbuf_flush 2: time elapsed 120.000054
> Sep 14 21:44:17 b2n1 pqact[1511588]: pbuf_flush (2) Timed out
> Sep 14 21:44:17 b2n1 pqact[1511588]: pipe_put:
> -close/home/decdev/bin/run_dctamd.sh/dcomdev/us007003/ldmdata/obs/upperair/tamdar20050914_2100.gz
>
> write error
The error messages above mean that the pqact(1) process was unable to
flush the pipe to the script /home/decdev/bin/run_dctamd.sh. The pipe
was open but the script wouldn't read from it within the allotted time
interval. The command in the script that reads from the pipe is
gzip -d > ${1}/$$.${2}
It's possible (though unlikely) that the gzip(1) process encountered a
problem with the data-product that caused it to terminate reading from
the standard input stream.
In any case, a definitive diagnosis is impossible unless a mechanism for
reporting errors is added to the script. I suggest adding the command
exec >> $HOME/logs/run_dctamd.log 2>&1
to the top of the script to help determine the cause of the problem.
Please contact me if you have any questions or discover something.
> Sep 14 21:44:17 b2n1 pqact[1511588]: file:
> /dcomdev/us007003/ldmdata/test/acars.20050914_2100.gz_214215
> ---
>
> Throughout the day we receive hundreds of these acars messages but only
> a couple will result in a time out and then the write error. After this
> error occurs the script that was acted on by LDM remains in the process
> table and has to be purged with a kill -9. We are also receiving this
> feed to a different system but we are not seeing these errors. On that
> system the only difference is the version of LDM, 6.0.15, the pqact.conf
> and script are the same for this datatype.
>
> We tried version 6.4.1 and the same errors occurred, we also recompiled
> 6.3.0 and increased DEFAULT_PIPE_TIMEO to 120 in pqact.c
>
> #define DEFAULT_PIPE_TIMEO 120
>
> again the errors still occurred.
>
> I have attached the /home/decdev/bin/run_dctamd.sh script, it basically
> unzips the stdin and puts the resulting data into a decoder.
>
> Any ideas?
>
> Thanks,
>
> Justin Cooke
> NCEP Central Operations
>
> --------------020706000607090506080009
> Content-Type: text/x-sh;
> name="run_dctamd.sh"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline;
> filename="run_dctamd.sh"
>
> #!/bin/sh -vx
>
> #
> # This script is EXECed directly by DBNet in order to run the
> # dctamd decoder on the data file given in the first argument.
> #
> # Usage: ./run_dctamd.sh <tamdar_filename>
> #
> # Once this is done, the data file itself is then compressed
> # within its native directory for more efficient short-term
> # storage.
> #
>
> # The gzip line must be the first, noncomment line in this script
> # so that stdin is processed correctly
>
> gzip -d > ${1}/$$.${2}
> madisfilename=${1}/`echo ${2} | cut -c1-13`
> hhmm=`date -u +%H%M`
> decoderfilename=${madisfilename}.${hhmm}
> mv ${1}/$$.${2} ${decoderfilename}
>
> . /ioddev/dbndev/.profile
>
> export MADIS_STATIC=$DCDROOT/lib/sorc/madis-2.5/static
> export MADIS_DATA=/dcomdev/us007003/ldmdata
>
> ln -sf ${decoderfilename} ${madisfilename}
>
> nice $DCDROOT/bin/decod_dctamd -v 2 \
> -d /dcomdev/us007003/decoder_logs/decod_dctamd.log \
> ${decoderfilename} /dcomdev/us007003/bufrtab.004
>
> rm -f ${madisfilename}
>
> #
> # Compress the decoder input file within its native directory,
> # in order to conserve disk space for these large files!
> #
>
> gzip ${decoderfilename}
>
> #
> # Explicitly set the script return code to 0, in order to prevent
> # the "compress" return code from becoming the script return code
> # (and thereby prevent DBNet from re-running the script for this
> # particular data file if there is a problem with the compress!)
> #
>
> exit 0
>
> --------------020706000607090506080009--
Regards,
Steve Emmerson