This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Justin, > We are currently running LDM 6.7.0 on an IBM P6 cluster supercomputer at > OS AIX 5.3 and are encountering errors in our LDM after it has been > running for approx 24 - 30 hours. Here is clip from out ldmd.log: > > Apr 30 22:01:38 c1n5 local0:warn|warning pqact[450700] WARN: > write(956,,32768) to decoder timed-out (60 s): > /nwprod/exec/decod_dcmetr-v2-t300-d/dcom/us007003/decoder_logs/decod_dcmetr.log/nwprod/fix/bufrtab.000/nwprod/dictionaries/metar.tbl > Apr 30 22:03:09 c1n5 local0:warn|warning pqact[450700] WARN: > write(956,,32768) to decoder took 58 s: > /nwprod/exec/decod_dcmetr-v2-t300-d/dcom/us007003/decoder_logs/decod_dcmetr.log/nwprod/fix/bufrtab.000/nwprod/dictionaries/metar.tbl > Apr 30 22:04:09 c1n5 local0:warn|warning pqact[450700] WARN: > write(956,,32768) to decoder timed-out (60 s): > /nwprod/exec/decod_dcmetr-v2-t300-d/dcom/us007003/decoder_logs/decod_dcmetr.log/nwprod/fix/bufrtab.000/nwprod/dictionaries/metar.tbl > Apr 30 22:05:56 c1n5 local0:warn|warning pqact[450700] WARN: > write(963,,32768) to decoder timed-out (60 s): > /nwprod/exec/decod_dcmetr-v2-t300-d/dcom/us007003/decoder_logs/decod_dcmetr.log/nwprod/fix/bufrtab.000/nwprod/dictionaries/metar.tbl > Apr 30 22:06:24 c1n5 local0:warn|warning pqact[450700] WARN: > write(943,,32768) to decoder took 25 s: > /nwprod/exec/decod_dcacft-v2-t300-d/dcom/us007003/decoder_logs/decod_dcacft.log/nwprod/dictionaries/pirep.tbl/nwprod/dictionaries/airep.tbl/nwprod/fix/bufrtab.004 > Apr 30 22:07:27 c1n5 local0:warn|warning pqact[450700] WARN: > write(943,,32768) to decoder timed-out (60 s): > /nwprod/exec/decod_dcacft-v2-t300-d/dcom/us007003/decoder_logs/decod_dcacft.log/nwprod/dictionaries/pirep.tbl/nwprod/dictionaries/airep.tbl/nwprod/fix/bufrtab.004 > Apr 30 22:08:27 c1n5 local0:warn|warning pqact[450700] WARN: > write(970,,32768) to decoder took 36 s: > /nwprod/exec/decod_dcmetr-v2-t300-d/dcom/us007003/decoder_logs/decod_dcmetr.log/nwprod/fix/bufrtab.000/nwprod/dictionaries/metar.tbl The warning messages indicate that the pqact(1) process took too-long to write to the given decoders. This could be because the decoders are too slow or because the system is overloaded. What version of GEMPAK are you using? Does the top(1) (or similar) utility indicate anything amiss with the load on the system when this happens? Is there anything significant in the dcmetr log file (/dcom/us007003/decoder_logs/decod_dcmetr.log)? > Many other processes running on the node slowdown when these errors > appear, stopping and letting the processes restart breaks free the > slowdown until it builds up again in 24 - 30 hours. We are running two > separate instances of LDM on two nodes of the supercomputer and both > show these errors but at different times. IBM is investigating this > issue as it has only recently started to occur in the last week even > though we have had LDM running on these nodes for a couple of months. To > assist in their troubleshooting I'm hoping that can give some insight > into possible system problems that would cause LDM/pqact to have these > errors. Immediately after stopping and letting the LDM restart all the > data that was having problems being acted on is piped to the decoders fine. You're getting the messages because the system is overloaded rather than the messages causing the system to become overloaded. > Thanks for any insight, > > Justin Cooke > NCEP Central Operations Regards, Steve Emmerson Ticket Details =================== Ticket ID: WPM-702818 Department: Support LDM Priority: Normal Status: Closed