This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Carissa, It would help if I could get a dump of the stack from the core-file, /var/spool/abrt/ccpp-2013-06-30-21:45:03-23471. Can you get that to me? Regardless, I recommend upgrading to the latest version of the LDM (6.11.6). It seems to not have the bug that caused the LDM 6.11.4 version to receive a SIGSEGV on an RHEL system. The consistency of the product-queue can be checked via the pqcheck(1) utility (if no other process has the product-queue open for writing) and the pqcat(1) utility (be sure to redirect the standard output stream to /dev/null). > Unidata, > > We experienced a corrupt queue on 1 of our 4 supercomputer LDM feeds. We > have gone to the admins who say there was no system issue around that > time period and have pointed us back to the data being the issue. > According to our admins the core dump indicates that LDM segfaulted at > 21:45:03 (see below): > Jun 30 21:45:03 t14d1 205.156.51.46[23471] INFO: 98963 > 20130630214502.648 NEXRAD2 293005 > L2-BZIP2/KBUF/20130630214449/293/5/I/V06/0 > Jun 30 21:45:03 t14d1 205.156.51.46[23471] INFO: 53099 > 20130630214501.925 NEXRAD2 87045 L2-BZIP2/KILX/20130630214256/87/45/I/V06/0 > > We have this same LDM feed from all 4 systems, only 1 had an issue. I > guess our main question is there any way to tell the difference if the > corrupt queue was data related, or system related? I do notice that the > core dump was put into a root directory, not the LDM home directory when > we have decoder issues. Do you folks see any evidence of what might have > caused this issue? > > The log output is below. > > Jun 30 21:45:03 t14d1 205.156.51.46[23471] INFO: 98963 > 20130630214502.648 NEXRAD2 293005 > L2-BZIP2/KBUF/20130630214449/293/5/I/V06/0 > Jun 30 21:45:03 t14d1 205.156.51.46[23471] INFO: 53099 > 20130630214501.925 NEXRAD2 87045 L2-BZIP2/KILX/20130630214256/87/45/I/V06/0 > Jun 30 21:45:03 t14d1 205.156.51.46[23471] INFO: 37749 > 20130630214503.004 NEXRAD2 550057 > L2-BZIP2/KHTX/20130630214031/550/57/I/V06/0 > Jun 30 21:45:03 t14d1 pqact[23461] INFO: [filel.c:297] Deleting closed > FILE entry > "/dcom/us007003/ldmdata/obs/upperair/nexrad_level2/KCBW/KCBW_20130630_214401.bz2" > Jun 30 21:45:03 t14d1 kernel: ldmd[23471]: segfault at 2adea9675818 ip > 00002ad5ac00be2a sp 00007fff7dc37400 error 4 in > libldm.so.0.0.0[2ad5abff2000+52000] > Jun 30 21:45:03 t14d1 sshd[18728]: Accepted publickey for dbnet from > 140.90.100.184 port 52885 ssh2 > Jun 30 21:45:03 t14d1 sshd[18728]: pam_unix(sshd:session): session > opened for user dbnet by (uid=0) > Jun 30 21:45:03 t14d1 sshd[18795]: Accepted publickey for dbnet from > 140.90.100.184 port 52886 ssh2 > Jun 30 21:45:03 t14d1 sshd[18795]: pam_unix(sshd:session): session > opened for user dbnet by (uid=0) > Jun 30 21:45:03 t14d1 abrt[18793]: Saved core dump of pid 23471 > (/gpfs/tmv/iodprod/dbnet/ldm/ldm-6.11.1/bin/ldmd) to > /var/spool/abrt/ccpp-2013-06-30-21:45:03-23471 (1835008 bytes) > Jun 30 21:45:03 t14d1 abrtd: Directory 'ccpp-2013-06-30-21:45:03-23471' > creation detected > Jun 30 21:45:03 t14d1 ldmd[23455] NOTE: child 23471 terminated by signal 11 > Jun 30 21:45:03 t14d1 ldmd[23455] NOTE: Killing (SIGTERM) process group > Jun 30 21:45:03 t14d1 t10d2p.ncep.noaa.gov(feed)[20338] NOTE: Exiting > Jun 30 21:45:03 t14d1 t14d2p.ncep.noaa.gov(feed)[26905] NOTE: Exiting > Jun 30 21:45:03 t14d1 ldmd[23455] NOTE: Exiting > Jun 30 21:45:03 t14d1 outreach.aviationweather.noaa.go[23477] NOTE: Exiting > Jun 30 21:45:03 t14d1 ldm.madis-data.noaa.gov[23469] NOTE: Exiting > Jun 30 21:45:03 t14d1 pqact[23461] ERROR: fcntl F_RDLCK failed for rgn > (0 SEEK_SET, 4096) 4: Interrupted system call > Jun 30 21:45:03 t14d1 pqact[23461] NOTE: Exiting > Jun 30 21:45:03 t14d1 205.156.51.46[23473] NOTE: Exiting > Jun 30 21:45:03 t14d1 pqact[23460] ERROR: fcntl F_RDLCK failed for rgn > (0 SEEK_SET, 4096) 4: Interrupted system call > Jun 30 21:45:03 t14d1 pqact[23460] ERROR: pq_sequence failed: > Interrupted system call (errno = 4) > Jun 30 21:45:03 t14d1 pqact[23460] NOTE: Exiting > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=11023, > cmd="/nwprod/exec/decod_dcbthy -v 2 -t 480 -d > /dcom/us007003/decoder_logsdecod_dcbthy.log /nwprod/fix/bufrtab.031" > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=26175, > cmd="/nwprod/exec/decod_dcacar -v 2 -t 600 -d > /dcom/us007003/decoder_logsdecod_dcacar.log > /nwprod/fix/bufrtab.ARINC_ACARS /nwprod/fix/bufrtab.EUROPE_ACARS > /nwprod/fix/bufrtab.CANADA_ACARS /nwprod/fix/bufrtab.FRANCE_ACARS > /nwprod/fix/bufrtab.004" > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=26718, > cmd="/nwprod/exec/decod_dcltng -v 2 -t 300 -d > /dcom/us007003/decoder_logsdecod_dcltng.log /nwprod/fix/bufrtab.007" > Jun 30 21:45:03 t14d1 ldm.madis-data.noaa.gov[23463] NOTE: Exiting > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=12452, > cmd="/nwprod/exec/decod_dcdrbu -v 2 -t 365 -d > /dcom/us007003/decoder_logsdecod_dcdrbu.log /nwprod/fix/bufrtab.001" > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=26112, > cmd="/nwprod/exec/decod_dcmsfc -v 2 -t 480 -d > /dcom/us007003/decoder_logsdecod_dcmsfc.log /nwprod/fix/bufrtab.001 > /nwprod/dictionaries/msfc.tbl /nwprod/dictionaries/tidg.tbl > /nwprod/parm/decod_restricted.ship.headers" > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=12552, > cmd="/nwprod/exec/decod_dclsfc -v 2 -t 300 -d > /dcom/us007003/decoder_logsdecod_dclsfc.log /nwprod/fix/bufrtab.000 > /nwprod/dictionaries/lsfc.tbl /nwprod/parm/decod_WMO.Res40.headers" > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=26226, > cmd="/nwprod/exec/decod_dcacft -v 2 -t 300 -d > /dcom/us007003/decoder_logsdecod_dcacft.log > /nwprod/dictionaries/pirep.tbl /nwprod/dictionaries/airep.tbl > /nwprod/fix/bufrtab.004" > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=26215, > cmd="/nwprod/exec/decod_dcshef -v 2 -t 450 -d > /dcom/us007003/decoder_logsdecod_dcshef.log /nwprod/parm/SHEFPARM > /nwprod/dictionaries/shef.tbl /nwprod/fix/bufrtab.000 > /nwprod/fix/bufrtab.001 /nwprod/fix/bufrtab.255 A.E.G.P.R.S.T.U.X." > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=26107, > cmd="/nwprod/exec/decod_dcmetr -v 2 -t 300 -d > /dcom/us007003/decoder_logsdecod_dcmetr.log /nwprod/fix/bufrtab.000 > /nwprod/dictionaries/metar.tbl" > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=7641, cmd="/nwprod/exec/decod_dcears > -v 2 -t 450 -d /dcom/us007003/decoder_logs/ecod_dcears.log > /nwprod/fix/bufrtab.EARS /nwprod/fix/bufrtab.021" > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=1068, cmd="/nwprod/exec/decod_dcrocc > -v 2 -t 600 -d /dcom/us007003/decoder_logs/ecod_dcrocc.log > /nwprod/fix/bufrtab.003" > Jun 30 21:45:03 t14d1 pqact[23460] INFO: [filel.c:295] Deleting > least-recently-used PIPE entry: pid=25990, > cmd="/nwprod/exec/decod_dcepfl -v 2 -t 450 -d > /dcom/us007003/decoder_logsdecod_dcepfl.log > /nwprod/fix/bufrtab.EUROPE_PROFILER /nwprod/fix/bufrtab.002" > Jun 30 21:45:03 t14d1 205.156.51.46[23470] NOTE: Exiting > Jun 30 21:45:03 t14d1 ldm.madis-data.noaa.gov[23465] NOTE: Exiting > Jun 30 21:45:03 t14d1 eldm.fsl.noaa.gov[23466] NOTE: Exiting > Jun 30 21:45:03 t14d1 ldm.madis-data.noaa.gov[23462] NOTE: Exiting > Jun 30 21:45:03 t14d1 140.90.85.102[23475] NOTE: Exiting > Jun 30 21:45:03 t14d1 ldmd[23455] NOTE: Terminating process group > Jun 30 21:45:03 t14d1 205.156.51.46[23472] INFO: 52428 > 20130630214503.125 NEXRAD2 514008 > L2-BZIP2/KRAX/20130630214439/514/8/I/V06/0 > Jun 30 21:45:03 t14d1 205.156.51.46[23472] NOTE: Exiting > Jun 30 21:45:03 t14d1 pqact[23460] NOTE: Behind by 0.219798 s > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: [uldb.c:1298] Entry for PID > 23455 not found > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: [uldb.c:1909] Couldn't remove > process from database > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 26905 exited with status 0 > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 20338 exited with status 0 > Jun 30 21:45:03 t14d1 pqact[23461] NOTE: Behind by 0.442305 s > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23477 exited with status 0 > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23469 exited with status 0 > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23473 exited with status 0 > Jun 30 21:45:03 t14d1 ldmd[23455] NOTE: child 23460 exited with status > 1: pqact -f ANY-CRAFT -v -o 900 /iodprod/dbnet/ldm/etc/pqact.conf > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23461 exited with status > 0: pqact -f CRAFT -v -o 900 /iodprod/dbnet/ldm/etc/pqact.craft > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23463 exited with status 0 > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23470 exited with status 0 > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23466 exited with status 0 > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23465 exited with status 0 > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23475 exited with status 0 > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23462 exited with status 0 > Jun 30 21:45:03 t14d1 ldmd[23455] INFO: child 23472 exited with status 0 > Jun 30 21:45:03 t14d1 abrtd: Executable > '/gpfs/tmv/iodprod/dbnet/ldm/ldm-6.11.1/bin/ldmd' doesn't belong to any > package > Jun 30 21:45:03 t14d1 abrtd: 'post-create' on > '/var/spool/abrt/ccpp-2013-06-30-21:45:03-23471' exited with 1 > Jun 30 21:45:03 t14d1 abrtd: Corrupted or bad directory > /var/spool/abrt/ccpp-2013-06-30-21:45:03-23471, deleting > > -- > Carissa Klemmer > NCEP Central Operations > Production Management Branch Dataflow Team > 301-683-3835 > > > > More info left off the ticket. > > We are running on RHEL 6.3 > LDM version - 6.11.1 Regards, Steve Emmerson Ticket Details =================== Ticket ID: BPI-711373 Department: Support LDM Priority: Normal Status: Closed