[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #WOM-274153]: Reproducible bug in LDM 6.13.10
- Subject: [LDM #WOM-274153]: Reproducible bug in LDM 6.13.10
- Date: Mon, 08 Apr 2019 08:16:54 -0600
Gilbert,
Lots of notices, but not a lot of useful errors.
I'm almost done with what I hope is a fix. It'll be out today.
> Last night, around 4Z, my phone went ballistic that NFS01 LDM had crashed.
> Since then, I have been unable to look at the logs. Finally, I got a chance
> today to look at them, and found this in NFS01's log:
>
> 20190407T024851.029540Z ldm-central1-b.c.tough-volt.internal[9940]
> svc_tcp.c:writetcp() ERROR Bad file descriptor
> 20190407T024851.029631Z ldm-central1-b.c.tough-volt.internal[9940]
> svc_tcp.c:writetcp() ERROR writetcp(): write() error on socket 6
> 20190407T024851.034471Z pqact[9909] pqact.c:cleanup() NOTE
> Exiting
> 20190407T024851.035295Z ldm-central1-b.c.tough-volt.internal[9980]
> ldmd.c:cleanup() NOTE Exiting
> 20190407T024851.035751Z ldm-central1-b.c.tough-volt.internal[9970]
> ldmd.c:cleanup() NOTE Exiting
> 20190407T024851.035955Z mrms-ldmout.ncep.noaa.gov[9961] ldmd.c:cleanup()
> NOTE Exiting
> 20190407T024851.036211Z mrms-ldmout.ncep.noaa.gov[9959] ldmd.c:cleanup()
> NOTE Exiting
> 20190407T024851.013354Z crunch-central1-b.c.tough-volt.internal(feed)[29485]
> ldmd.c:cleanup() NOTE Exiting
> 20190407T024851.037351Z 217.40.148.146.bc.googleusercontent.com(feed)[29483]
> ldmd.c:cleanup() NOTE Exiting
> 20190407T024851.048380Z mrms-ldmout.ncep.noaa.gov[9957] ldmd.c:cleanup()
> NOTE Exiting
> 20190407T024851.048605Z ldm-central1-b.c.tough-volt.internal[9956]
> ldmd.c:cleanup() NOTE Exiting
> 20190407T024851.852953Z pqact[9908] pqact.c:cleanup() NOTE
> Behind by 0.850289 s
> 20190407T024851.900469Z c7.e3.37a9.ip4.static.sl-reverse.com(feed)[15634]
> ldmd.c:cleanup() NOTE Exiting
> 20190407T024851.902295Z ldmd[9906] ldmd.c:reap() NOTE child
> 15634 exited with status 6
> 20190407T024852.800791Z ldmd[9906] ldmd.c:reap() NOTE child
> 30339 exited with status 6
> 20190407T024853.452416Z pqact[9909] pqact.c:cleanup() NOTE
> Behind by 11.783 s
> 20190407T024854.369487Z pqcheck[727] pqcheck.c:main() NOTE
> Starting Up (688)
> 20190407T024854.369728Z pqcheck[727] pqcheck.c:cleanup() NOTE
> Exiting
> 20190407T024854.794561Z pqcheck[815] pqcheck.c:main() NOTE
> Starting Up (776)
> 20190407T024854.794756Z pqcheck[815] pqcheck.c:cleanup() NOTE
> Exiting
> 20190407T024855.309912Z pqcheck[900] pqcheck.c:main() NOTE
> Starting Up (861)
> 20190407T024855.310217Z pqcheck[900] pqcheck.c:cleanup() NOTE
> Exiting
> 20190407T024855.812917Z pqcheck[986] pqcheck.c:main() NOTE
> Starting Up (947)
> 20190407T024855.813150Z pqcheck[986] pqcheck.c:cleanup() NOTE
> Exiting
> 20190407T024856.311303Z pqcheck[1071] pqcheck.c:main() NOTE
> Starting Up (1032)
> 20190407T024856.311507Z pqcheck[1071] pqcheck.c:cleanup() NOTE
> Exiting
> 20190407T030630.296476Z pqcheck[8399] pqcheck.c:main() NOTE
> Starting Up (8360)
>
>
> These are the entries leading up to the crash and at the moment of
> termination on LDM01:
>
> 20190407T040632.861946Z freshair.atmos.washington.edu[29638]
> error.c:err_log() NOTE Upstream LDM didn't reply to FEEDME request; RPC:
> Authentication error; why = (authentication error 5)
> 20190407T040634.591378Z ldmd[3163] ldmd.c:runChildLdm()
> ERROR Denying connection from " 163.41.148.146.bc.googleusercontent.com"
> because not allowed
> 20190407T040634.591501Z ldmd[3163] ldmd.c:cleanup() NOTE
> Exiting
> 20190407T040634.594553Z ldmd[29628] ldmd.c:reap() NOTE child
> 3163 exited with status 3
> 20190407T040656.511169Z ldmd[6514] ldmd.c:runChildLdm()
> ERROR Denying connection from "101.230.188.35.bc.googleusercontent.com"
> because not allowed
> 20190407T040656.511284Z ldmd[6514] ldmd.c:cleanup() NOTE
> Exiting
> 20190407T040656.512448Z ldmd[29628] ldmd.c:reap() NOTE child
> 6514 exited with status 3
> 20190407T040658.927093Z nfs-central1-b.c.tough-volt.internal(feed)[12753]
> error.c:err_log() NOTE Couldn't flush connection;
> flushConnection() failure to nfs-central1-b.c.tough-volt.internal: RPC:
> Unable to receive; errno = Connection reset by peer
> 20190407T040659.012186Z nfs-central1-b.c.tough-volt.internal(feed)[12753]
> ldmd.c:cleanup() NOTE Exiting
> 20190407T040659.013603Z ldmd[29628] ldmd.c:reap() NOTE child
> 12753 exited with status 6
> 20190407T040659.946475Z nfs-central1-b.c.tough-volt.internal(feed)[6833]
> up6.c:up6_run() NOTE Starting Up(6.13.10/6): 20190407040433.927283
> TS_ENDT {{HDS, "...... KWNS"}}, SIG=69f1a24bb19a0e365e85ffc19eb8700e, Primary
> 20190407T040659.946577Z nfs-central1-b.c.tough-volt.internal(feed)[6833]
> up6.c:up6_run() NOTE topo: nfs-central1-b.c.tough-volt.internal
> {{HDS, (.*)}}
> 20190407T040704.151926Z s444.pingdom.com[7473] svc_tcp.c:readtcp() NOTE
> EOF on socket 3
> 20190407T040704.152042Z s444.pingdom.com[7473]
> one_svc_run.c:one_svc_run() NOTE RPC layer closed connection
> 20190407T040704.152064Z s444.pingdom.com[7473] ldmd.c:runSvc() NOTE
> Connection with client LDM, s444.pingdom.com, has been lost
> 20190407T040704.152097Z s444.pingdom.com[7473] ldmd.c:cleanup() NOTE
> Exiting
> 20190407T040711.139610Z nfs-central1-b.c.tough-volt.internal(feed)[8099]
> up6.c:up6_run() NOTE Starting Up(6.13.10/6): 20190407040445.125077
> TS_ENDT {{FSL2, "^FSL.CompressedNetCDF.MADIS..*"}},
> SIG=443697845cf4381835cfbf28dd3f051b, Alternate
> 20190407T040711.139690Z nfs-central1-b.c.tough-volt.internal(feed)[8099]
> up6.c:up6_run() NOTE topo: nfs-central1-b.c.tough-volt.internal
> {{FSL2, (.*)}}
> 20190407T040721.281299Z ldmd[9739] ldmd.c:runChildLdm()
> ERROR Denying connection from " 42.157.203.35.bc.googleusercontent.com"
> because not allowed
> 20190407T040721.281421Z ldmd[9739] ldmd.c:cleanup() NOTE
> Exiting
> 20190407T040721.283071Z ldmd[29628] ldmd.c:reap() NOTE child
> 9739 exited with status 3
> 20190407T040725.234284Z nfs-central1-b.c.tough-volt.internal(feed)[17387]
> error.c:err_log() NOTE Couldn't flush connection;
> flushConnection() failure to nfs-central1-b.c.tough-volt.internal: RPC:
> Unable to receive; errno = Connection reset by peer
> 20190407T040725.249234Z nfs-central1-b.c.tough-volt.internal(feed)[17387]
> ldmd.c:cleanup() NOTE Exiting
> 20190407T040725.250297Z ldmd[29628] ldmd.c:reap() NOTE child
> 17387 exited with status 6
> 20190407T040726.683332Z mrms-ldmout.ncep.noaa.gov[29692] error.c:err_log()
> NOTE Upstream LDM died: pid=15539
> 20190407T040726.683460Z mrms-ldmout.ncep.noaa.gov[29692]
> requester6.c:req6_new() NOTE LDM-6 desired product-class:
> 20190407035441.683416 TS_ENDT {{EXP,
> "/nfsdata/realtime/outgoing/grib2/GUAM/MRMS_MergedReflectivityQComposite_00.50"},{NONE,
> "SIG=82430f1f715ade$
> 20190407T040726.713271Z mrms-ldmout.ncep.noaa.gov[29670] error.c:err_log()
> NOTE Upstream LDM died: pid=23830
> 20190407T040726.713513Z mrms-ldmout.ncep.noaa.gov[29670]
> requester6.c:req6_new() NOTE LDM-6 desired product-class:
> 20190407035441.713362 TS_ENDT {{EXP,
> "/nfsdata/realtime/outgoing/grib2/(CONUS|ALASKA|HAWAII|GUAM|CARIB)/MRMS_MESH_Max_1440min_00.50_"},{NONE,"S$
> 20190407T040726.714009Z mrms-ldmout.ncep.noaa.gov[29683] error.c:err_log()
> NOTE Upstream LDM died: pid=29367
> 20190407T040726.714142Z mrms-ldmout.ncep.noaa.gov[29683]
> requester6.c:req6_new() NOTE LDM-6 desired product-class:
> 20190407035441.714084 TS_ENDT {{EXP,
> "/nfsdata/realtime/outgoing/grib2/(CONUS|ALASKA|HAWAII|GUAM|CARIB)/MRMS_Reflectivity_-20C_00.50_"},{NONE,
> "$
> 20190407T040726.834381Z mrms-ldmout.ncep.noaa.gov[29689] error.c:err_log()
> NOTE Upstream LDM died: pid=29325
> 20190407T040726.834564Z mrms-ldmout.ncep.noaa.gov[29689]
> requester6.c:req6_new() NOTE LDM-6 desired product-class:
> 20190407035441.834469 TS_ENDT {{EXP,
> "/nfsdata/realtime/outgoing/grib2/(CONUS|ALASKA|HAWAII|GUAM|CARIB)/MRMS_MergedReflectivityQC_03.00_"},{NONE$
> 20190407T040726.852918Z mrms-ldmout.ncep.noaa.gov[29692]
> requester6.c:make_request() NOTE Upstream LDM-6 on mrms-ldmout.ncep.noaa.gov
> is willing to be a primary feeder
> 20190407T040726.936573Z mrms-ldmout.ncep.noaa.gov[29694] error.c:err_log()
> NOTE Upstream LDM died: pid=15467
> 20190407T040726.936719Z mrms-ldmout.ncep.noaa.gov[29694]
> requester6.c:req6_new() NOTE LDM-6 desired product-class:
> 20190407035441.936654 TS_ENDT {{EXP,
> "/nfsdata/realtime/outgoing/grib2/CARIB/MRMS_MergedReflectivityQComposite_00.50"},
> NONE, "SIG=b91bcb0f04d07$
> 20190407T040726.951350Z mrms-ldmout.ncep.noaa.gov[29670]
> requester6.c:make_request() NOTE Upstream LDM-6 on mrms-ldmout.ncep.noaa.gov
> is willing to be a primary feeder
> 20190407T040726.954393Z mrms-ldmout.ncep.noaa.gov[29683]
> requester6.c:make_request() NOTE Upstream LDM-6 on mrms-ldmout.ncep.noaa.gov
> is willing to be a primary feeder
> 20190407T040727.674193Z mrms-ldmout.ncep.noaa.gov[29689]
> requester6.c:make_request() NOTE Upstream LDM-6 on mrms-ldmout.ncep.noaa.gov
> is willing to be a primary feeder
> 20190407T040727.755649Z mrms-ldmout.ncep.noaa.gov[29694]
> requester6.c:make_request() NOTE Upstream LDM-6 on mrms-ldmout.ncep.noaa.gov
> is willing to be a primary feeder
>
>
> So, does this make sense?
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: WOM-274153
Department: Support LDM
Priority: High
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata
inquiry tracking system and then made publicly available through the web. If
you do not want to have your interactions made available in this way, you must
let us know in each email you send to us.