[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Staging #KYN-678004]: ldm problems on ingest



Greg,

> We are still trying to figure out the problem we spoke with you and
> Tom and Mike about in October where ldm seems to be associated with
> intermittent high iowaits that hang processes on one of our systems.
> 
> Today we transferred all of our ldm processes to another server to
> try and isolate the problem. On this new server, (which we've faked to
> look like the old one so its still ingest.eol.ucar.edu) we've installed
> ldm 6.6.5. This afternoon we got a segfault from dcmetr (see below).
> We're running a fresh install of Gempak 5.11.1 (not an upgrade).

Unfortunately, I know almost nothing about the GEMPAK "dcmetr" decoder.
More unfortunately, our GEMPAK expert, Steve Chiswell, has taken a position
elsewhere.  I'll pass your inquiry around the office to see if anyone
has any advice.

In the meantime, have you checked the support-GEMPAK email to see if anyone
else has encountered problems with "dcmetr"?  You could also post an
inquiry to the "gembud" mailing-list.

Sorry I can't be of more help.  From the LDM perspective, the problem with
"dcmetr" lies outside the scope of the LDM.

> Here's the log output from around that time:
> 
> 
> [2640] 080122/1258[DCMETR 2] KNID 221956Z 00000KT 10SM OVC080 06/M03
> A3015 RMK AO2 SLP220 T00561028 $
> [2640] 080122/1258[DC 2] read 122/24120 bytes strt 78280 newstrt 78402
> [2640] 080122/1258[DC 2] read 174/23998 bytes strt 78402 newstrt 78576
> [2640] 080122/1258[DC 2] read 447/23824 bytes strt 78576 newstrt 79023
> [2640] 080122/1258[DCMETR 2] LIRQ 221950Z 05004KT 9999 FEW050 06/05 Q1013
> [2640] 080122/1258[DC 2] read 825/23377 bytes strt 79023 newstrt 79848
> [2640] 080122/1258[DCMETR 2] EHVK 221955Z AUTO 23004KT 9999 FEW120
> 01/M00 Q1029 BLU
> [2640] 080122/1258[DC 2] read 121/22552 bytes strt 79848 newstrt 79969
> [2640] 080122/1258[DC 2] read 229/22431 bytes strt 79969 newstrt 80198
> [2640] 080122/1258[DC 2] read 130/22202 bytes strt 80198 newstrt 80328
> [5573] 080122/1258[DC 3] Version 5.11.1
> [5573] 080122/1258[DCMETR 7] 3.3
> [5573] 080122/1258[DC 2] read 796/102399 bytes strt 0 newstrt 796
> [5576] 080122/1258[DC 3] Version 5.11.1
> [5576] 080122/1258[DCMETR 7] 3.3
> [5576] 080122/1258[DC 2] read 11940/102399 bytes strt 0 newstrt 11940
> [5577] 080122/1258[DC 3] Version 5.11.1
> [5577] 080122/1258[DCMETR 7] 3.3
> [5577] 080122/1258[DC 2] read 8026/102399 bytes strt 0 newstrt 8026
> [5578] 080122/1258[DC 3] Version 5.11.1
> [5578] 080122/1258[DCMETR 7] 3.3
> [5578] 080122/1258[DC 2] read 197/102399 bytes strt 0 newstrt 197
> [5579] 080122/1258[DC 3] Version 5.11.1
> [5579] 080122/1258[DCMETR 7] 3.3
> 
> 
> It looks like something unusual happened at 12:58 and after that its
> only reading part of the record. It's also not writing anything to the
> decoded file. Here's the dcmetr part of the pqact.conf file that is
> being run:
> 
> WMO     ^S[AP]
> PIPE    dcmetr -v 2 -a 500 -m 72 -s sfmetar_sa.tbl
> -d logs/dcmetr.log
> -e GEMTBL=/home/gempak/NAWIPS/gempak/tables
> data/gempak/surface/saYYYYMMDD.gem
> 
> 
> Any ideas of what's going on?
> 
> Thanks,
> Greg
> 
> -------- Original Message --------
> Subject: kernel segfaults on new ingest
> Date: Tue, 22 Jan 2008 13:03:55 -0700
> From: Santiago Newbery <address@hidden>
> Reply-To: address@hidden
> Organization: NCAR
> To: ted russ <address@hidden>
> CC: gregory stossmeister <address@hidden>, address@hidden
> 
> looks like something just broke...
> lots of these in the message log
> 
> Jan 22 13:02:17 ingest kernel: dcmetr[5953]: segfault at
> 0000007fc0000000 rip 000000000042dfc4 rsp 0000007fbfffdd70 error 4
> Jan 22 13:02:17 ingest kernel: dcmetr[5954]: segfault at
> 0000007fc0000000 rip 000000000042dfc4 rsp 0000007fbfffdd70 error 4
> 
> --
> --Santiago
> 
> --
> 
> ~~N~A~T~I~O~N~A~L~~C~E~N~T~E~R~~F~O~R~~A~T~M~O~S~.~~R~E~S~E~A~R~C~H
> Greg Stossmeister                      e-mail: address@hidden
> NCAR/EOL                               phone: (303)497-8692
> P.O. Box 3000                          web: http://www.eol.ucar.edu
> Boulder, CO 80307-3000
> ~~~~~~~~E~A~R~T~H~~~O~B~S~E~R~V~I~N~G~~~L~A~B~O~R~A~T~O~R~Y~~~~~~~~

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: KYN-678004
Department: Support IDD
Priority: Normal
Status: Open