[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20040818: 20040818: Gempak decoder crashing problem

This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.

Subject: Re: 20040818: 20040818: Gempak decoder crashing problem
Date: Wed, 25 Aug 2004 12:34:02 -0400 (EDT)


On Wed, 18 Aug 2004, Unidata Support wrote:

The "message: table grib3.tbl" indicates the the modeling center
that the grid is being labeled is not in the  $GEMTBL/grid/cntrgrib1.tbl
file.

I have scanned through the NOAAPORT ingest logs, and see that there are
a small number of proucts ^O[LMN]NC88 KWNB (Wave direction, height, period) that
are identifying themselves as center 161. For the time being, you might want to
duplicate the NCEP entry for #7 to 161 so that the ncepgrib3.tbl is located.


Steve,

I followed your suggestion wrt to duplicating the #7 table entry to 161and that eliminated the error messages in the logs. However now I'mseeing messages of this type in the dcgrib2_ocean.log:

[8539] 040823/1836 [DCGRIB 1] Grid navigation 235 incompatible with filedata/gempak/model/2004082300_ocn.gem


I'm also seeing similar log entries in the dcgrib2_NWW.log:

[4235] 040823/1651 [DCGRIB 1] Grid navigation 23 incompatible with file data/g
empak/model/2004082300_ocn.gem

and the dcgrib2_GFSthin.log:

[1486] 040824/1659 [DCGRIB 1] Grid navigation 43 incompatible with filedata/gempak/model/2004082400_ocn.gem

The last two are a little puzzling to me since I wouldn't think that thosedcgrib instances would be writing to the

data/gempak/model/YYYYMMDDHH_ocn.gem file.

One other thing that I should mention is that in addition to crashing withthe segmentation violation, a decoder instance occasionally becomes arogue process which never exits and consumes a good deal of CPU time. Ithink this results in pqact not getting enough time and starting to fallbehind a little. Previously under 5.6.k, this was happening quitefrequently and with a number of decoders, although mainly with dcgrib2.(You may want to refer to my reply to you of 8/18 regarding compileroptimization and how I compiled 5.6.k and 5.7.2p2. I'm assuming youreceived it but it's not showing up in the email archive for some reason.)Since going to 5.7.2p2, it has only ocurred with dcgrib2 and only with aninstance decoding the ocean grids. Here is an example from today:


vortex# top

last pid: 16925; load averages: 2.98, 2.88, 2.7711:38:22

112 processes: 107 sleeping, 3 running, 1 zombie, 1 on cpu
CPU states:     % idle,     % user,     % kernel,     % iowait,     % swap
Memory: 512M real, 13M free, 347M swap in use, 1041M swap free

  PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 4635 ldm        1  26    2   32M 1824K run    147:42 30.33% dcgrib2
16033 ldm        1  25    2  391M  321M run    610:04 29.27% pqact
16879 ldm        1  25    2   27M 4208K run      0:06 11.78% dcrdf
16789 ldm        1  42    2   29M 4896K sleep    0:03  3.78% dcgrib2
16925 root       1  30    0 1568K 1208K cpu      0:00  2.36% top
16910 ldm        1  29    2   27M 3632K sleep    0:00  0.91% dctaf
14021 ldm        1  52    2   24M 3200K sleep    0:42  0.80% dcmetr
16049 ldm        1  43    2  393M  213M sleep   12:25  0.72% rpc.ldmd
16037 ldm        1  52    2  390M  291M sleep   60:56  0.71% pqbinstats
16922 ldm        1  47    2   24M 3256K sleep    0:00  0.52% dcuair
16899 ldm        1  46    2   32M 7168K sleep    0:00  0.50% dcgrib2
16885 ldm        1  52    2   24M 4808K sleep    0:01  0.49% dcmsfc
16055 ldm        1  52    2  390M  227M sleep   10:14  0.48% rpc.ldmd
16051 ldm        1  53    2  390M  167M sleep    5:49  0.42% rpc.ldmd
16042 ldm        1  53    2  392M  313M sleep   29:50  0.38% rpc.ldmd
16057 ldm        1  53    2  390M   40M sleep    8:49  0.33% rpc.ldmd
  214 root       4  58    0 2864K 1624K sleep  415:31  0.32% ypserv
vortex# ps -ef|grep dcgrib
    root 16944 11384  0 11:38:29 pts/4    0:00 grep dcgrib

ldm 16789 16033 5 11:35:06 ? 0:04 decoders/dcgrib2 -ddata/gempak/logs/dcgrib2_GFS.log -e GEMTBL=/weather/GEMPAK5ldm 4635 16033 32 06:56:15 ? 147:45 decoders/dcgrib2 -ddata/gempak/logs/dcgrib2_ocean.log -e GEMTBL=/weather/GEMPAldm 16899 16033 1 11:37:55 ? 0:00 decoders/dcgrib2 -ddata/gempak/logs/dcgrib2_GFSthin.log -e GEMTBL=/weather/GEM

I don't have a good feel if these 2 problems are related or not, but itseems to point to dcgrib2 having problems with the ocean data for somereason. Otherwise why wouldn't other instances of dcgrib2 decoding ETAgrids, e.g., be crashing or gobbling up the CPU?


Tom
-----------------------------------------------------------------------------
Tom McDermott                           Email: address@hidden
Systems Administrator                   Phone: (585) 395-5718
Earth Sciences Dept.                    Fax: (585) 395-2416
SUNY College at Brockport

From: Tom McDermott <address@hidden>
Organization: UCAR/Unidata
Keywords: 200408181808.i7II8SaW025880


On Wed, 18 Aug 2004, Unidata Support wrote:

I'll see if I can create a duplicate of your problem for the 5.7.3
release I'm working on.

Steve Chiswell
Unidata User SUpport


Steve,

One other thing that I'm seeing now and only in the 'dcgrib2_ocean.log'
are these messages:

[3639] 040818/1124 [NA -1]  The table grib3.tbl cannot be opened.
...

[3640] 040818/1124 [NA -1]  The table grib3.tbl cannot be opened.

BTW, even though the entries for children 3639 and 3640 in the log are not
intermingled, it looks like they may have been running at the same time
because of the timestamps and also this message from 'ldmd.log':

Aug 18 15:25:00 vortex pqact[26768]: child 3640 terminated by signal 11
Aug 18 15:25:00 vortex pqact[26768]: child 3639 terminated by signal 11

So this may be the multiple file writers problem.

Tom
-----------------------------------------------------------------------------
Tom McDermott                           Email: address@hidden
Systems Administrator                   Phone: (585) 395-5718
Earth Sciences Dept.                    Fax: (585) 395-2416
SUNY College at Brockport

--
**************************************************************************** <
Unidata User Support                                    UCAR Unidata Program <
(303)497-8643                                                  P.O. Box 3000 <
address@hidden                                   Boulder, CO 80307 <
---------------------------------------------------------------------------- <
Unidata WWW Service              http://my.unidata.ucar.edu/content/support  <
---------------------------------------------------------------------------- <
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publically available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.

References:
- 20040818: 20040818: Gempak decoder crashing problem
  - From: Unidata Support

Prev by Date: 20040824: NLDN
Next by Date: RE: 20040824: Background Map resolution
Previous by thread: 20040818: 20040818: Gempak decoder crashing problem
Next by thread: 20040819: GEMPAK - gridpoint MOS output?
Index(es):
- Date
- Thread