[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 20040818: 20040818: Gempak decoder crashing problem
- Subject: Re: 20040818: 20040818: Gempak decoder crashing problem
- Date: Wed, 25 Aug 2004 12:34:02 -0400 (EDT)
On Wed, 18 Aug 2004, Unidata Support wrote:
The "message: table grib3.tbl" indicates the the modeling center
that the grid is being labeled is not in the $GEMTBL/grid/cntrgrib1.tbl
file.
I have scanned through the NOAAPORT ingest logs, and see that there are
a small number of proucts ^O[LMN]NC88 KWNB (Wave direction, height, period) that
are identifying themselves as center 161. For the time being, you might want to
duplicate the NCEP entry for #7 to 161 so that the ncepgrib3.tbl is located.
Steve,
I followed your suggestion wrt to duplicating the #7 table entry to 161
and that eliminated the error messages in the logs. However now I'm
seeing messages of this type in the dcgrib2_ocean.log:
[8539] 040823/1836 [DCGRIB 1] Grid navigation 235 incompatible with file
data/gempak/model/2004082300_ocn.gem
I'm also seeing similar log entries in the dcgrib2_NWW.log:
[4235] 040823/1651 [DCGRIB 1] Grid navigation 23 incompatible with file data/g
empak/model/2004082300_ocn.gem
and the dcgrib2_GFSthin.log:
[1486] 040824/1659 [DCGRIB 1] Grid navigation 43 incompatible with file
data/gempak/model/2004082400_ocn.gem
The last two are a little puzzling to me since I wouldn't think that those
dcgrib instances would be writing to the
data/gempak/model/YYYYMMDDHH_ocn.gem file.
One other thing that I should mention is that in addition to crashing with
the segmentation violation, a decoder instance occasionally becomes a
rogue process which never exits and consumes a good deal of CPU time. I
think this results in pqact not getting enough time and starting to fall
behind a little. Previously under 5.6.k, this was happening quite
frequently and with a number of decoders, although mainly with dcgrib2.
(You may want to refer to my reply to you of 8/18 regarding compiler
optimization and how I compiled 5.6.k and 5.7.2p2. I'm assuming you
received it but it's not showing up in the email archive for some reason.)
Since going to 5.7.2p2, it has only ocurred with dcgrib2 and only with an
instance decoding the ocean grids. Here is an example from today:
vortex# top
last pid: 16925; load averages: 2.98, 2.88, 2.77
11:38:22
112 processes: 107 sleeping, 3 running, 1 zombie, 1 on cpu
CPU states: % idle, % user, % kernel, % iowait, % swap
Memory: 512M real, 13M free, 347M swap in use, 1041M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
4635 ldm 1 26 2 32M 1824K run 147:42 30.33% dcgrib2
16033 ldm 1 25 2 391M 321M run 610:04 29.27% pqact
16879 ldm 1 25 2 27M 4208K run 0:06 11.78% dcrdf
16789 ldm 1 42 2 29M 4896K sleep 0:03 3.78% dcgrib2
16925 root 1 30 0 1568K 1208K cpu 0:00 2.36% top
16910 ldm 1 29 2 27M 3632K sleep 0:00 0.91% dctaf
14021 ldm 1 52 2 24M 3200K sleep 0:42 0.80% dcmetr
16049 ldm 1 43 2 393M 213M sleep 12:25 0.72% rpc.ldmd
16037 ldm 1 52 2 390M 291M sleep 60:56 0.71% pqbinstats
16922 ldm 1 47 2 24M 3256K sleep 0:00 0.52% dcuair
16899 ldm 1 46 2 32M 7168K sleep 0:00 0.50% dcgrib2
16885 ldm 1 52 2 24M 4808K sleep 0:01 0.49% dcmsfc
16055 ldm 1 52 2 390M 227M sleep 10:14 0.48% rpc.ldmd
16051 ldm 1 53 2 390M 167M sleep 5:49 0.42% rpc.ldmd
16042 ldm 1 53 2 392M 313M sleep 29:50 0.38% rpc.ldmd
16057 ldm 1 53 2 390M 40M sleep 8:49 0.33% rpc.ldmd
214 root 4 58 0 2864K 1624K sleep 415:31 0.32% ypserv
vortex# ps -ef|grep dcgrib
root 16944 11384 0 11:38:29 pts/4 0:00 grep dcgrib
ldm 16789 16033 5 11:35:06 ? 0:04 decoders/dcgrib2 -d
data/gempak/logs/dcgrib2_GFS.log -e GEMTBL=/weather/GEMPAK5
ldm 4635 16033 32 06:56:15 ? 147:45 decoders/dcgrib2 -d
data/gempak/logs/dcgrib2_ocean.log -e GEMTBL=/weather/GEMPA
ldm 16899 16033 1 11:37:55 ? 0:00 decoders/dcgrib2 -d
data/gempak/logs/dcgrib2_GFSthin.log -e GEMTBL=/weather/GEM
I don't have a good feel if these 2 problems are related or not, but it
seems to point to dcgrib2 having problems with the ocean data for some
reason. Otherwise why wouldn't other instances of dcgrib2 decoding ETA
grids, e.g., be crashing or gobbling up the CPU?
Tom
-----------------------------------------------------------------------------
Tom McDermott Email: address@hidden
Systems Administrator Phone: (585) 395-5718
Earth Sciences Dept. Fax: (585) 395-2416
SUNY College at Brockport
From: Tom McDermott <address@hidden>
Organization: UCAR/Unidata
Keywords: 200408181808.i7II8SaW025880
On Wed, 18 Aug 2004, Unidata Support wrote:
I'll see if I can create a duplicate of your problem for the 5.7.3
release I'm working on.
Steve Chiswell
Unidata User SUpport
Steve,
One other thing that I'm seeing now and only in the 'dcgrib2_ocean.log'
are these messages:
[3639] 040818/1124 [NA -1] The table grib3.tbl cannot be opened.
...
[3640] 040818/1124 [NA -1] The table grib3.tbl cannot be opened.
BTW, even though the entries for children 3639 and 3640 in the log are not
intermingled, it looks like they may have been running at the same time
because of the timestamps and also this message from 'ldmd.log':
Aug 18 15:25:00 vortex pqact[26768]: child 3640 terminated by signal 11
Aug 18 15:25:00 vortex pqact[26768]: child 3639 terminated by signal 11
So this may be the multiple file writers problem.
Tom
-----------------------------------------------------------------------------
Tom McDermott Email: address@hidden
Systems Administrator Phone: (585) 395-5718
Earth Sciences Dept. Fax: (585) 395-2416
SUNY College at Brockport
--
**************************************************************************** <
Unidata User Support UCAR Unidata Program <
(303)497-8643 P.O. Box 3000 <
address@hidden Boulder, CO 80307 <
---------------------------------------------------------------------------- <
Unidata WWW Service http://my.unidata.ucar.edu/content/support <
---------------------------------------------------------------------------- <
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publically available
through the web. If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.