[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #TIB-337303]: New IDD relay - ldm is hanging after some time
- Subject: [LDM #TIB-337303]: New IDD relay - ldm is hanging after some time
- Date: Fri, 03 Jun 2011 10:15:14 -0600
Pete,
> Finally got a backtrace from it. Process 8472 was stuck at 100% CPU when
> I started gdb.
>
> [root@idd ldm-6.9.7]# gdb ldmd 8472
> GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/local/ldm/bin/ldmd...done.
> Attaching to program: /usr/local/ldm/bin/ldmd, process 8472
> Reading symbols from /usr/local/ldm/lib/libldm.so.0...done.
> Loaded symbols for /usr/local/ldm/lib/libldm.so.0
> Reading symbols from /usr/local/ldm/lib/libxml2.so.2...done.
> Loaded symbols for /usr/local/ldm/lib/libxml2.so.2
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libnss_files.so.2
> Reading symbols from /lib64/libnss_dns.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libnss_dns.so.2
> Reading symbols from /lib64/libresolv.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libresolv.so.2
> 0x000000337088431b in _wordcopy_fwd_dest_aligned () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.7.el6_0.5.x86_64
> (gdb) bt
> #0 0x000000337088431b in _wordcopy_fwd_dest_aligned () from
> /lib64/libc.so.6
> #1 0x0000003370883856 in bcopy () from /lib64/libc.so.6
How could a process be stuck using 100% of the CPU in bcopy(3)? That's a
rhetorical question. The answer is "It can't." I suspect, therefore, that
something's wrong with either your runtime library or with the operating-system.
> #2 0x00007f3aa8030f20 in xdrmem_putbytes (xdrs=0x7fff1bd97df0,
> addr=<value optimized out>, len=404167) at xdr_mem.c:150
> #3 0x00007f3aa803078a in my_xdr_opaque (xdrs=0x7fff1bd97df0, cp=<value
> optimized out>, cnt=<value optimized out>) at xdr.c:463
> #4 0x00007f3aa800f73c in xproduct (pq=0x1be6160, prod=0x7fff1bd97e90)
> at pq.c:4541
> #5 pq_insertNoSig (pq=0x1be6160, prod=0x7fff1bd97e90) at pq.c:5538
> #6 0x00007f3aa800fb26 in pq_insert (pq=<value optimized out>,
> prod=<value optimized out>) at pq.c:5578
The process is trying to insert something into the product-queue.
> #7 0x00007f3aa8017543 in dh_saveDataProduct (pq=<value optimized out>,
> info=0x1be8910, data=<value optimized out>, wasHereis=1, notifyAutoShift=1)
> at DownHelp.c:159
> #8 0x00007f3aa801c65d in hereis_6_svc (prod=<value optimized out>,
> rqstp=0x7fff1bd984d0) at ldm6_server.c:661
> #9 0x00007f3aa801acf1 in ldmprog_6 (rqstp=0x7fff1bd984d0,
> transp=0x1cb12d0) at ldm6_svc.c:100
> #10 0x00007f3aa802e629 in svc_getreqsock (sock=<value optimized out>) at
> svc.c:543
> #11 0x00007f3aa801ffa2 in one_svc_run (xp_sock=4, inactive_timeo=<value
> optimized out>) at one_svc_run.c:91
> #12 0x00007f3aa8022aca in run_service (upName=0x1be6140
> "idd.unidata.ucar.edu", port=30085840, request=<value optimized out>,
> inactiveTimeout=60,
> pqPathname=0x7fff1bd987a4 "\377\177", pq=<value optimized out>,
> isPrimary=1) at requester6.c:229
> #13 req6_new (upName=0x1be6140 "idd.unidata.ucar.edu", port=30085840,
> request=<value optimized out>, inactiveTimeout=60,
> pqPathname=0x7fff1bd987a4 "\377\177", pq=<value optimized out>,
> isPrimary=1) at requester6.c:672
> #14 0x00007f3aa8014a62 in prog_requester (ldmPort=<value optimized out>)
> at acl.c:1631
> #15 run_requester (ldmPort=<value optimized out>) at acl.c:1809
> #16 new_requester (ldmPort=<value optimized out>) at acl.c:1865
> #17 requester_add (ldmPort=<value optimized out>) at acl.c:1908
> #18 invert_request_acl (ldmPort=<value optimized out>) at acl.c:1974
> #19 0x0000000000405980 in read_conf (conf_path=0x7f3aa82517a0
> "/usr/local/ldm/etc/ldmd.conf", doSomething=1, defaultPort=388) at
> parser.y:593
> #20 0x00000000004065b7 in main (ac=<value optimized out>, av=<value
> optimized out>) at ldmd.c:1049
> (gdb)
>
>
> ps -eflu ldm shows this while connected to gdb:
>
> [ldm@idd ~]$ ps -flu ldm
> F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY
> TIME CMD
> 4 S ldm 2697 2696 0 80 0 - 28344 n_tty_ May24 pts/2
> 00:00:03 -tcsh
> 5 S ldm 3098 3093 0 80 0 - 24362 poll_s May24 ?
> 00:02:07 sshd: ldm@pts/3
> 0 S ldm 3099 3098 0 80 0 - 28270 rt_sig May24 pts/3
> 00:00:00 -tcsh
> 5 S ldm 8462 1 0 80 0 - 5277 poll_s 16:14 ?
> 00:00:00 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 0 S ldm 8464 8462 0 80 0 - 6024619 fcntl_ 16:14 ?
> 00:00:26 rtstats -h rtstats.unidata.ucar.edu
> 1 S ldm 8465 8462 0 80 0 - 6025590 fcntl_ 16:14 ?
> 00:00:19 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8466 8462 0 80 0 - 6025585 fcntl_ 16:14 ?
> 00:00:17 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8467 8462 0 80 0 - 6025672 fcntl_ 16:14 ?
> 00:00:20 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8468 8462 0 80 0 - 6025653 fcntl_ 16:14 ?
> 00:00:16 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8469 8462 0 80 0 - 6029770 fcntl_ 16:14 ?
> 00:00:30 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8470 8462 0 80 0 - 6029677 fcntl_ 16:14 ?
> 00:00:01 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8471 8462 0 80 0 - 6026399 fcntl_ 16:14 ?
> 00:00:26 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 T ldm 8472 8462 10 80 0 - 6025021 utrace 16:14 ?
> 00:12:43 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8473 8462 3 80 0 - 6025008 fcntl_ 16:14 ?
> 00:04:12 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8474 8462 8 80 0 - 6025041 fcntl_ 16:14 ?
> 00:09:54 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8475 8462 2 80 0 - 6025007 fcntl_ 16:14 ?
> 00:03:20 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8476 8462 0 80 0 - 6025039 fcntl_ 16:14 ?
> 00:00:09 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8477 8462 8 80 0 - 6025050 fcntl_ 16:14 ?
> 00:10:01 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 1 S ldm 8478 8462 0 80 0 - 6025061 poll_s 16:14 ?
> 00:00:02 ldmd -I 144.92.130.90 -P 388 -M 512 -m 3600 -o 3600 -q
> /usr/local/ldm/var/queues/ldm.pq /us
> 5 S ldm 8524 8520 0 80 0 - 24362 poll_s 16:15 ?
> 00:00:01 sshd: ldm@pts/5
> 0 S ldm 8525 8524 0 80 0 - 28354 rt_sig 16:15 pts/5
> 00:00:00 -tcsh
> 0 R ldm 8880 8525 95 80 0 - 26993 - 18:16 pts/5
> 00:00:13 ps -flu ldm
> 5 S ldm 12678 12673 0 80 0 - 24362 poll_s May26 ?
> 00:00:00 sshd: ldm@pts/4
> 0 S ldm 12679 12678 0 80 0 - 28251 n_tty_ May26 pts/4
> 00:00:00 -tcsh
> 4 S ldm 38352 38350 0 80 0 - 28261 rt_sig 16:04 pts/0
> 00:00:00 -tcsh
>
> As soon as I detached gdb from the process it went back to 100% CPU.
I don't know what to tell you, Pete. I've never seen this happen and, as far as
I can tell, it shouldn't happen. 100% of CPU in bcopy(3)!?
> --
> Pete Pokrandt - Systems Programmer
> UW-Madison Dept of Atmospheric and Oceanic Sciences
> 608-262-3086 - address@hidden
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: TIB-337303
Department: Support LDM
Priority: Normal
Status: Closed