[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "pbuf_flush: time elapsed" problem
- Subject: Re: "pbuf_flush: time elapsed" problem
- Date: Fri, 16 Sep 2005 10:03:59 -0600
Justin,
>Date: Fri, 16 Sep 2005 09:48:18 -0400
>From: Justin Cooke <address@hidden>
>Organization: NOAA
>To: address@hidden
>Subject: Re: "pbuf_flush: time elapsed" problem
The above message contained the following:
> I added the logging to the script that we are seeing the most
> "pbuf_flush Timed out" from. Yesterday we also broke apart the pqact
> processing, previously we only had one pqact for all incoming data. We
> created a second entry in ldmd.conf for just CRAFT data and the existing
> entry was modified to act on all data type except CRAFT but we still saw
> the same errors overnight. Below is another log entry:
>
> ---
> Sep 16 03:02:21 b2n1 eldm4[585824]: 7504142 20050916030208.060 FSL5
> 000 FSL.CompressedNetCDF.MADIS.mesonet4.20050916_0100.gz
> Sep 16 03:02:21 b2n1 pqact[909556]: 7504142 20050916030208.060 FSL5
> 000 FSL.CompressedNetCDF.MADIS.mesonet4.20050916_0100.gz
> Sep 16 03:02:21 b2n1 pqact[909556]: pipe: -close
> /usr/ldm/ncep_bin/run_dcmeso.sh
> /dcomdev/us007003/ldmdata/obs/surface/mesonet 20050916_0100.gz
> Sep 16 03:04:21 b2n1 pqact[909556]: pbuf_flush 2: time elapsed 120.000055
> Sep 16 03:04:21 b2n1 pqact[909556]: pbuf_flush (2) Timed out
> Sep 16 03:04:21 b2n1 pqact[909556]: pipe_put:
> -close/usr/ldm/ncep_bin/run_dcmeso.sh/dcomdev/us007003/ldmdata/obs/surface/mesonet20050916_0100.gz
>
> write error
> ---
>
> But there is nothing in the dcmeso.log file related to this time:
>
> ---
> END ---- PID= 1151078 ---- Thu Sep 15 23:02:08 EDT 2005
> ----------------------------
> exit 0
> + exit 0
> echo "START -- PID=" $$ " ----" `date` "----------------------------"
> + date
> + echo START -- PID= 684064 ---- Thu Sep 15 23:04:21 EDT 2005
> ----------------------------
> START -- PID= 684064 ---- Thu Sep 15 23:04:21 EDT 2005
> ----------------------------
>
>
>
>
> gzip -d > ${1}/$$.${2}
> + gzip -d
> + 1> /dcomdev/us007003/ldmdata/obs/surface/mesonet/684064.20050916_0200.gz
> madisfilename=${1}/`echo ${2} | cut -c1-13`
> ----
>
> The END time of the previous run_dcmeso.sh instance is 03:02:08Z and the
> next doesn't start until 03:04:21Z which is right after the "Timed out"
> error but the file being acted on is 20050916_0200.gz where the "Timed
> out" was for 20050916_0100.gz. The "START" line is never added when the
> the pipe: -close for 03:02:21 is run, it's as if the log is never
> touched, looking like the script never ran.
>
> any ideas?
Not yet. I'm going to ponder this today.
Incidentally, you might want to invoke the newlog(1) utility via
crontab(1) to keep the logfile "dcmeso.log" from growing indefinitely.
> We are running this version of LDM on AIX 5.2, is there any other system
> information that you would find useful?
I'll let you know.
> I have attached run_dcmeso.sh, it is slightly different from the script
> that I was talking about yesterday but it is doing the same major steps,
> unzipping a file and sending it off to a decoder.
>
> Thanks for the help,
>
> Justin
[snip]
> #!/bin/sh -vx
> #
> # This script is EXECed directly by DBNet in order to run the
> # dcmeso decoder on the data file given in the first argument.
> #
> # Usage: ./run_dcmeso.sh <mesonet_filename>
> #
> # Once this is done, the data file itself is then compressed
> # within its native directory for more efficient short-term
> # storage.
> #
>
> # The gzip line must be the first, noncomment line in this script
> # so that stdin is processed correctly
> exec >> /usr/ldm/logs/dcmeso.log 2>&1
> echo "START -- PID=" $$ " ----" `date` "----------------------------"
>
> gzip -d > ${1}/$$.${2}
> madisfilename=${1}/`echo ${2} | cut -c1-13`
> hhmm=`date -u +%H%M`
> decoderfilename=${madisfilename}.${hhmm}
> mv ${1}/$$.${2} ${decoderfilename}
>
> . /ioddev/dbndev/.profile
>
> #export MADIS_STATIC=/nwprod/lib/sorc/madis/static
> export MADIS_STATIC=$DCDROOT/lib/sorc/madis-2.5/static
> export MADIS_DATA=/dcomdev/us007003/ldmdata
>
> ln -sf ${decoderfilename} ${madisfilename}
>
> nice $DCDROOT/bin/decod_dcmeso -v 2 -d
> /dcomdev/us007003/decoder_logs/decod_dcmeso.log ${decoderfilename}
> /dcomdev/us007003/bufrtab.255
>
> rm -f ${madisfilename}
>
> #
> # Compress the decoder input file within its native directory,
> # in order to conserve disk space for these large files!
> #
>
> gzip ${decoderfilename}
>
> #
> # Explicitly set the script return code to 0, in order to prevent
> # the "compress" return code from becoming the script return code
> # (and thereby prevent DBNet from re-running the script for this
> # particular data file if there is a problem with the compress!)
> #
>
> echo "END ---- PID=" $$ " ----" `date` "----------------------------"
> exit 0
Regards,
Steve Emmerson