[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #URY-771320]: LDM crash on noaaport2.cod.edu
- Subject: [LDM #URY-771320]: LDM crash on noaaport2.cod.edu
- Date: Thu, 16 Jun 2016 16:36:04 -0600
Gilbert,
That's a lot of stuff to go through.
Was the problem you encountered the deletion of the product-queue or was it
that a noaaportIngester(1) process on Noaaport2 terminated?
> Hello Steve,
>
> I am filing this on behalf of the College of DuPage
> weather program.
>
> They have two NOAAport ingesters, one called
> noaaport1, and another called noaaport2. This
> is the LDM log they had up until today:
>
> 20160616T172541.915447Z climate.cod.edu(feed)[87852] NOTE
> up6.c:448:up6_run() topo: climate.cod.edu {{NOTHER|NGRAPH|NGRID|WMO,
> (.*)}}
> 20160616T180257.714530Z climate.cod.edu(feed)[87852] NOTE
> error.c:236:err_log() Failure; COMINGSOON: RPC: Unable to receive; errno =
> Connection reset by peer
> 20160616T180257.718349Z climate.cod.edu(feed)[88804] NOTE
> uldb.c:1533:sm_vetUpstreamLdm() Terminated obsolete upstream LDM
> (addr=10.11.0.65, pid=87852, vers=6, type=feeder, mode=alternate,
> sub=(20160616165105.
> 909806 TS_ENDT {{NOTHER|NGRAPH|NGRID|WMO, ".*"}}))
> 20160616T180257.777565Z climate.cod.edu(feed)[87852] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T180257.778198Z ldmd[1652] NOTE ldmd.c:168:reap() child 87852
> exited with status 6
> 20160616T180258.720061Z climate.cod.edu(feed)[88804] NOTE
> up6.c:445:up6_run() Starting Up(6.13.1/6): 20160616172822.714117 TS_ENDT
> {{NOTHER|NGRAPH|NGRID|WMO, ".*"}}, SIG=be6fe5d603a6ad3357c99379e61de688,
> Prima
> ry
> 20160616T180258.720095Z climate.cod.edu(feed)[88804] NOTE
> up6.c:448:up6_run() topo: climate.cod.edu {{NOTHER|NGRAPH|NGRID|WMO,
> (.*)}}
> 20160616T185401.837778Z noaaportIngester[1655] ERROR
> productMaker.c:948:pmStart() Missing GOES fragment in sequence, last
> 1155/141123 this 1157/141123
> 20160616T185401.856478Z noaaportIngester[1657] ERROR
> productMaker.c:569:pmStart() ERROR in calculation of psh len 32802 16
> 20160616T185401.856523Z noaaportIngester[1657] ERROR
> readsbn.c:24:readsbn() SBN checksum invalid 2443 26836
> 20160616T185401.856543Z noaaportIngester[1657] ERROR
> readsbn.c:24:readsbn() SBN checksum invalid 2413 55144
> 20160616T185401.856570Z noaaportIngester[1657] ERROR
> readsbn.c:24:readsbn() SBN checksum invalid 2435 10272
> 20160616T185403.874899Z ldmd[1652] NOTE ldmd.c:122:reap() child 1657
> terminated by signal 11: noaaportIngester -m 224.0.1.4
> 20160616T185403.874927Z ldmd[1652] NOTE ldmd.c:148:reap() Killing
> (SIGTERM) process group
> 20160616T185403.875297Z noaaportIngester[1663] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.875341Z noaaportIngester[1663] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.875509Z noaaportIngester[1660] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.875555Z noaaportIngester[1660] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.875534Z noaaportIngester[1661] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.875579Z noaaportIngester[1661] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.875582Z noaaportIngester[1662] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.875636Z noaaportIngester[1662] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.875641Z noaaportIngester[1658] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.875649Z noaaportIngester[1658] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.876115Z noaaportIngester[1656] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.876130Z noaaportIngester[1656] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.876126Z noaaportIngester[1659] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.876170Z noaaportIngester[1659] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.876268Z noaaportIngester[1663] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration P13DT16H16M1.667276S
> Raw Data:
> Octets 0
> Mean Rate:
> Octets 0/s
> Bits 0/s
> Received frames:
> Number 0
> Mean Rate 0/s
> Missed frames:
> Number 0
> % -nan
> Full FIFO:
> Number 0
> % -nan
> Products:
>
> Inserted 0
> Mean Rate 0/s
> Since Start:
> Duration P13DT16H16M1.667276S
> Raw Data:
> Octets 0
> Mean Rate:
> Octets 0/s
> Bits 0/s
> Received frames:
> Number 0
> Mean Rate 0/s
> Missed frames:
> Number 0
> % -nan
> Full FIFO:
> Number 0
> % -nan
> Products:
> Inserted 0
> Mean Rate 0/s
> ----------------------------------------
> 20160616T185403.876303Z noaaportIngester[1654] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.876318Z noaaportIngester[1654] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.876413Z noaaportIngester[1662] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration P13DT16H16M1.659078S
> Raw Data:
> Octets 0
> Mean Rate:
> Octets 0/s
> Bits 0/s
> Received frames:
> Number 0
> Mean Rate 0/s
> Missed frames:
> Number 0
> % -nan
> Full FIFO:
> Number 0
> % -nan
> Products:
> Inserted 0
> Mean Rate 0/s
> Since Start:
> Duration P13DT16H16M1.659078S
> Raw Data:
> Octets 0
> Mean Rate:
> Octets 0/s
> Bits 0/s
> Received frames:
> Number 0
> Mean Rate 0/s
> Missed frames:
> Number 0
> % -nan
> Full FIFO:
> Number 0
> % -nan
> Products:
> Inserted 0
> Mean Rate 0/s
> ----------------------------------------
> 20160616T185403.876544Z weather.cod.edu(feed)[82274] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.876601Z noaaportIngester[1660] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration P13DT16H16M1.665902S
> Raw Data:
> Octets 0
> Mean Rate:
> Octets 0/s
> Bits 0/s
> Received frames:
> Number 0
> Mean Rate 0/s
> Missed frames:
> Number 0
> % -nan
> Full FIFO:
> Number 0
> % -nan
> Products:
> Inserted 0
> Mean Rate 0/s
> Since Start:
> Duration P13DT16H16M1.665902S
> Raw Data:
> Octets 0
> Mean Rate:
> Octets 0/s
> Bits 0/s
> Received frames:
> Number 0
> Mean Rate 0/s
> Missed frames:
> Number 0
> % -nan
> Full FIFO:
> Number 0
> % -nan
> Products:
> Inserted 0
> Mean Rate 0/s
> ----------------------------------------
> 20160616T185403.876688Z noaaportIngester[1658] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration P13DT16H16M1.663040S
> Raw Data:
> Octets 213446871416
> Mean Rate:
> Octets 180618/s
> Bits 1.44494e+06/s
> Received frames:
> Number 52992790
> Mean Rate 44.8422/s
> Missed frames:
> Number 4307
> % 0.00812686
> Full FIFO:
> Number 0
> % 0
> Products:
> Inserted 22074
> Mean Rate 0.0186789/s
> Since Start:
> Duration P13DT16H16M1.663040S
> Raw Data:
> Octets 213446871416
> Mean Rate:
> Octets 180618/s
> Bits 1.44494e+06/s
> Received frames:
> Number 52992790
> Mean Rate 44.8422/s
> Missed frames:
> Number 4307
> % 0.00812686
> Full FIFO:
> Number 0
> % 0
> Products:
> Inserted 22074
> Mean Rate 0.0186789/s
> ----------------------------------------
> 20160616T185403.876777Z noaaportIngester[1656] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration P13DT16H16M1.662962S
> Raw Data:
> Octets 2474898832260
> Mean Rate:
> Octets 2.09425e+06/s
> Bits 1.6754e+07/s
> Received frames:
> Number 620913333
> Mean Rate 525.413/s
> Missed frames:
> Number 138559
> % 0.0223104
> Full FIFO:
> Number 0
> % 0
> Products:
> Inserted 14069438
> Mean Rate 11.9055/s
> Since Start:
> Duration P13DT16H16M1.662962S
> Raw Data:
> Octets 2474898832260
> Mean Rate:
> Octets 2.09425e+06/s
> Bits 1.6754e+07/s
> Received frames:
> Number 620913333
> Mean Rate 525.413/s
> Missed frames:
> Number 138559
> % 0.0223104
> Full FIFO:
> Number 0
> % 0
> Products:
> Inserted 14069438
> Mean Rate 11.9055/s
> ----------------------------------------
> 20160616T185403.876820Z noaaportIngester[1655] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration P13DT16H16M1.667874S
> Raw Data:
> Octets 56217502582
> Mean Rate:
> Octets 47570.9/s
> Bits 380567/s
> Received frames:
> Number 24507095
> Mean Rate 20.7378/s
> Missed frames:
> Number 2359
> % 0.00962486
> Full FIFO:
> Number 0
> % 0
> Products:
> Inserted 31776
> Mean Rate 0.0268887/s
> Since Start:
> Duration P13DT16H16M1.667874S
> Raw Data:
> Octets 56217502582
> Mean Rate:
> Octets 47570.9/s
> Bits 380567/s
> Received frames:
> Number 24507095
> Mean Rate 20.7378/s
> Missed frames:
> Number 2359
> % 0.00962486
> Full FIFO:
> Number 0
> % 0
> Products:
> Inserted 31776
> Mean Rate 0.0268887/s
> ----------------------------------------
> 20160616T185403.876849Z noaaportIngester[1661] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration P13DT16H16M1.662971S
> Raw Data:
> Octets 0
> Mean Rate:
> Octets 0/s
> Bits 0/s
> Received frames:
> Number 0
> Mean Rate 0/s
> Missed frames:
> Number 0
> % -nan
> Full FIFO:
> Number 0
> % -nan
> Products:
> Inserted 0
> Mean Rate 0/s
> Since Start:
> Duration P13DT16H16M1.662971S
> Raw Data:
> Octets 0
> Mean Rate:
> Octets 0/s
> Bits 0/s
> Received frames:
> Number 0
> Mean Rate 0/s
> Missed frames:
> Number 0
> % -nan
> Full FIFO:
> Number 0
> % -nan
> Products:
> Inserted 0
> Mean Rate 0/s
> ----------------------------------------
> 20160616T185403.876870Z noaaportIngester[1659] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration P13DT16H16M1.661478S
> Raw Data:
> Octets 0
> Mean Rate:
> Octets 0/s
> Bits 0/s
> Received frames:
> Number 0
> Mean Rate 0/s
> Missed frames:
> Number 0
> % -nan
> Full FIFO:
> Number 0
> % -nan
> Products:
> Inserted 0
> Mean Rate 0/s
> Since Start:
> Duration P13DT16H16M1.661478S
> Raw Data:
> Octets 0
> Mean Rate:
> Octets 0/s
> Bits 0/s
> Received frames:
> Number 0
> Mean Rate 0/s
> Missed frames:
> Number 0
> % -nan
> Full FIFO:
> Number 0
> % -nan
> Products:
> Inserted 0
> Mean Rate 0/s
> ----------------------------------------
> 20160616T185403.876867Z noaaportIngester[1654] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration P13DT16H16M1.658771S
> Raw Data:
> Octets 893238258738
> Mean Rate:
> Octets 755853/s
> Bits 6.04683e+06/s
> Received frames:
> Number 268717744
> Mean Rate 227.387/s
> Missed frames:
> Number 79186
> % 0.0294594
> Full FIFO:
> Number 0
> % 0
> Products:
> Inserted 50676084
> Mean Rate 42.8818/s
> Since Start:
> Duration P13DT16H16M1.658771S
> Raw Data:
> Octets 893238258738
> Mean Rate:
> Octets 755853/s
> Bits 6.04683e+06/s
> Received frames:
> Number 268717744
> Mean Rate 227.387/s
> Missed frames:
> Number 79186
> % 0.0294594
> Full FIFO:
> Number 0
> % 0
> Products:
> Inserted 50676084
> Mean Rate 42.8818/s
> ----------------------------------------
> 20160616T185403.887878Z atlas.cod.edu(feed)[80064] NOTE
> error.c:236:err_log() Couldn't flush connection; flushConnection() failure
> to atlas.cod.edu: RPC: Unable to receive; errno = Bad file descriptor
> 20160616T185403.887884Z climate.cod.edu(feed)[88804] NOTE
> error.c:236:err_log() Couldn't flush connection; flushConnection() failure
> to climate.cod.edu: RPC: Unable to receive; errno = Bad file descriptor
> 20160616T185403.903885Z ldmd[1652] NOTE ldmd.c:185:cleanup() Exiting
> 20160616T185403.903961Z ldmd[1652] NOTE ldmd.c:256:cleanup() Terminating
> process group
> 20160616T185403.904040Z weather.cod.edu(feed)[78167] NOTE
> error.c:236:err_log() Couldn't flush connection; flushConnection() failure
> to weather.cod.edu: RPC: Unable to receive; errno = Bad file descriptor
> 20160616T185403.907932Z cdstats.cod.edu(feed)[49285] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.915906Z atlas.cod.edu(feed)[1895] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.975903Z wxsandbox2.cod.edu(feed)[5377] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.979950Z climate.cod.edu(feed)[119425] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.983914Z climate.cod.edu(feed)[38263] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.987934Z wxsandbox1.cod.edu(feed)[67005] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.995878Z rtstats[1664] NOTE rtstats.c:134:cleanup() Exiting
> 20160616T185404.100965Z climate.cod.edu(feed)[88804] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185404.101386Z ldmd[1652] NOTE ldmd.c:168:reap() child 88804
> exited with status 6
> 20160616T185404.667889Z atlas.cod.edu(feed)[80064] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185404.668387Z ldmd[1652] NOTE ldmd.c:168:reap() child 80064
> exited with status 6
> 20160616T185404.922282Z weather.cod.edu(feed)[78167] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185404.922685Z ldmd[1652] NOTE ldmd.c:168:reap() child 78167
> exited with status 6
> 20160616T191002.843715Z pqcheck[91525] NOTE pqcheck.c:150:main() Starting
> Up (91402)
> 20160616T191002.843776Z pqcheck[91525] ERROR pqcheck.c:202:main()
> pq_get_write_count() failure: /dev/shm/ldm.pq: No such file or directory
> 20160616T191002.843785Z pqcheck[91525] NOTE pqcheck.c:71:cleanup() Exiting
>
>
>
> This is what showed in /var/log/syslog:
>
> Jun 16 18:54:01 noaaport2 kernel: [1183315.725767] traps:
> noaaportIngeste[1689] general protection ip:7f1c610a0c84 sp:7f17b1b75dc0
> error:0 in libpthread-2.21.so[7f1c61097000+18000]
> Jun 16 18:54:07 noaaport2 systemd[1]: Stopping User Manager for UID
> 1000...
> Jun 16 18:54:07 noaaport2 systemd[966]: Reached target Shutdown.
> Jun 16 18:54:07 noaaport2 systemd[966]: Starting Exit the Session...
> Jun 16 18:54:07 noaaport2 systemd[966]: Stopped target Default.
> Jun 16 18:54:07 noaaport2 systemd[966]: Stopped target Basic System.
> Jun 16 18:54:07 noaaport2 systemd[966]: Stopped target Sockets.
> Jun 16 18:54:07 noaaport2 systemd[966]: Stopped target Paths.
> Jun 16 18:54:07 noaaport2 systemd[966]: Stopped target Timers.
> Jun 16 18:54:07 noaaport2 systemd[966]: Received SIGRTMIN+24 from PID
> 90148
> (kill).
> Jun 16 18:54:07 noaaport2 systemd[1]: Stopped User Manager for UID 1000.
> Jun 16 18:54:07 noaaport2 systemd[1]: Removed slice user-1000.slice.
>
> Jun 16 18:55:01 noaaport2 CRON[90168]: (ldm) CMD (/bin/bash -l -c
> '/home/ldm/bin/ldmadmin addmetrics')
> Jun 16 18:55:02 noaaport2 postfix/pickup[87999]: 81BB51D7A: uid=1000
> from=<ldm>
> Jun 16 18:55:02 noaaport2 postfix/cleanup[90249]: 81BB51D7A: message-id=<
> address@hidden>
> Jun 16 18:55:02 noaaport2 postfix/qmgr[1111]: 81BB51D7A: from=<
> address@hidden>, size=708, nrcpt=1 (queue active)
> Jun 16 18:55:02 noaaport2 postfix/local[90251]: 81BB51D7A: to=<
> address@hidden>, orig_to=<ldm>, relay=local, delay=0.12,
> delays=0.08/0/0/0.03, dsn=2.0.0, status=sent (delivered to mailbox)
> Jun 16 18:55:02 noaaport2 postfix/qmgr[1111]: 81BB51D7A: removed
>
> So, in the process of all of this happening, the ldm.pq file got erased.
> I have no idea what any of this means, but hopefully you can piece this
> together. No core file dumped.
> And, sorry...I hit "send" too quickly: This all happened on
> noaaport2.cod.edu. Noaaport1.cod.edu was just fine and
> kept on ticking. Both receive the Novra broadcast identically
> via a network switch; I can access their Novra box via
> noaaport1 or 2. Again, Noaaport1.cod.edu had no issues and
> just kept humming right along.
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: URY-771320
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata
inquiry tracking system and then made publicly available through the web. If
you do not want to have your interactions made available in this way, you must
let us know in each email you send to us.