[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20040831 rpc.ldmd signal 11s
- Subject: 20040831 rpc.ldmd signal 11s
- Date: Tue, 31 Aug 2004 09:16:21 -0600
Hi Art,
> To: address@hidden
> From: "Arthur A. Person" <address@hidden>
> Subject: rpc.ldmd signal 11's
> Organization: Penn State University
> Keywords: 200408311254.i7VCsh8E018445
The above message contained the following:
> I've seen two cases (on two separate systems) where an rpc.ldmd process
> has died with signal 11 killing LDM data collection. Both are running LDM
> V6.0.15 on RedHat EL 3 update 2 kernel 2.4.21-15.0.4.ELsmp and fully
> patched.
This is new. I'll see if I can reproduce that behavior here.
> The process did not core dump.
The operating system must be told to allow a core-dump by the LDM user.
This is usually done via the command
ulimit -c unlimited
Before executing the "ldmadmin start" command, verify that core-dumps are
allowed via the command
ulimit -c
and use the previous command if they're not.
> Here's an excerpt of the ldmd.log files for the most recent:
>
> Aug 26 13:58:49 ls2 rpc.ldmd[3634]: Starting Up (version: 6.0.15; built:
> Jul 14 2004 15:25:10)
>
> Aug 26 13:58:49 ls2 pqact[3637]: Starting Up
> Aug 26 13:58:49 ls2 pqact[3638]: Starting Up
> Aug 26 13:58:49 ls2 pqact[3639]: Starting Up
> Aug 26 13:58:49 ls2 pqbinstats[3635]: Starting Up (3634)
> Aug 26 13:58:49 ls2 pqact[3641]: Starting Up
> Aug 26 13:58:49 ls2 pqact[3640]: Starting Up
> Aug 26 13:58:49 ls2 ldm[3645]: Starting Up(6.0.15): ldm.meteo.psu.edu:
> TS_ZERO TS_ENDT {{ANY,
> ".*"}}
> Aug 26 13:58:49 ls2 ldm[3645]: Desired product class: 20040826135844.784
> TS_ENDT {{ANY, ".*"}
> }
> Aug 26 13:58:49 ls2 pqsurf[3643]: Starting Up (3634)
> Aug 26 13:58:49 ls2 rtstats[3644]: Starting Up (3634)
> Aug 26 13:58:49 ls2 pqact[3646]: Starting Up
> Aug 26 13:58:50 ls2 ldm[3645]: Connected to upstream LDM-6
> Aug 26 13:58:51 ls2 ldm[3645]: Upstream LDM is willing to feed
> Aug 26 14:00:06 ls2 pnga2area[4203]: Starting Up
> Aug 26 14:00:06 ls2 pnga2area[4203]: unPNG:: 115626 309200 2.6741
> Aug 26 14:00:06 ls2 pnga2area[4203]: Exiting
> Aug 26 14:00:50 ls2 pnga2area[4780]: Starting Up
> Aug 26 14:00:50 ls2 pnga2area[4780]: unPNG:: 856353 4506096 5.2620
> Aug 26 14:00:50 ls2 pnga2area[4780]: Exiting
> Aug 26 14:00:52 ls2 pnga2area[4819]: Starting Up
> Aug 26 14:00:52 ls2 pnga2area[4819]: unPNG:: 1067122 4506096 4.2227
> Aug 26 14:00:52 ls2 pnga2area[4819]: Exiting
> .
> .
> .
> Aug 28 05:33:04 ls2 pnga2area[30968]: Starting Up
> Aug 28 05:33:04 ls2 pnga2area[30968]: unPNG:: 90094 242720 2.6941
> Aug 28 05:33:04 ls2 pnga2area[30968]: Exiting
> Aug 28 05:34:03 ls2 pnga2area[31478]: Starting Up
> Aug 28 05:34:03 ls2 pnga2area[31478]: unPNG:: 74544 242720 3.2561
> Aug 28 05:34:03 ls2 pnga2area[31478]: Exiting
> Aug 28 05:35:17 ls2 rpc.ldmd[3634]: child 3645 terminated by signal 11
> Aug 28 05:35:17 ls2 rpc.ldmd[3634]: Killing (SIGINT) process group
> Aug 28 05:35:17 ls2 pqact[3637]: Interrupt
> Aug 28 05:35:17 ls2 pqbinstats[3635]: Interrupt
> Aug 28 05:35:17 ls2 pqact[3637]: Exiting
> Aug 28 05:35:17 ls2 pqact[3638]: Interrupt
> Aug 28 05:35:17 ls2 pqact[3638]: Exiting
> Aug 28 05:35:17 ls2 pqact[3639]: Interrupt
> Aug 28 05:35:17 ls2 rtstats[3644]: Interrupt
> Aug 28 05:35:17 ls2 rpc.ldmd[3634]: SIGINT
> Aug 28 05:35:17 ls2 pqact[3646]: Interrupt
> Aug 28 05:35:17 ls2 pqsurf[3643]: Interrupt
> Aug 28 05:35:17 ls2 pqact[3641]: Interrupt
> Aug 28 05:35:17 ls2 pqact[3639]: Exiting
> Aug 28 05:35:17 ls2 pqact[3646]: Exiting
> Aug 28 05:35:17 ls2 pqact[3640]: Interrupt
> Aug 28 05:35:17 ls2 pqact[3640]: Exiting
> Aug 28 05:35:17 ls2 pqbinstats[3635]: Exiting
> Aug 28 05:35:17 ls2 rtstats[3644]: Exiting
> Aug 28 05:35:17 ls2 pqsurf[3643]: Exiting
> Aug 28 05:35:17 ls2 pqact[3641]: Exiting
> Aug 28 05:35:17 ls2 pqsurf[3643]: Queue usage (bytes):10682240
> Aug 28 05:35:17 ls2 pqsurf[3643]: (nregions): 54034
> Aug 28 05:35:17 ls2 pqsurf[3643]: Number of products 86836
> Aug 28 05:35:17 ls2 pqsurf[3643]: Number of observations 381591
> Aug 28 05:35:17 ls2 pqsurf[3643]: Number of dups 51657
> Aug 28 05:35:17 ls2 rpc.ldmd[3634]: Terminating process group
>
> Any ideas what might be causing this, and/or what I might do to capture
> more/better information to track it down?
Unless the LDM server is built with the "-g" (debugging) option, the
core-dump will be of limited utility. If you don't mind, doing the
following would help greatly:
1. Go to the top-level source-directory.
2. Execute the command "make distclean".
3. Set the environment variables CFLAGS and CPPFLAGS to "-g"
and "-DNDEBUG", respectively (without the quotes).
4. Execute the following commands in order:
make
ldmadmin stop
5. Become the superuser.
6. Execute the following commands in order:
make server/install_setuids
ulimit -c unlimited
ldmadmin start
7. Cross your fingers. :-)
Your help in this would be greatly appreciated.
> Thanks.
>
> Art.
>
> Arthur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email: address@hidden, phone: 814-863-1563
Regards,
Steve Emmerson
> NOTE: All email exchanges with Unidata User Support are recorded in the
> Unidata inquiry tracking system and then made publically available
> through the web. If you do not want to have your interactions made
> available in this way, you must let us know in each email you send to us.