This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Justin, >Date: Wed, 19 Oct 2005 07:44:56 -0400 >From: Justin Cooke <address@hidden> >Organization: NOAA/NWS/FSL >To: Steve Emmerson <address@hidden> >Subject: Re: "pbuf_flush: time elapsed" problem The above message contained the following: [snip] > Yes, I'm talking about the upstream LDM process. [snip] > > Would you be willing to modify the LDM source-code and then rebuild and > > reinstall it with debugging and assertions enabled? > > Yes we would Good. I'll let you know what to do. > >> Something else that may be of interest, we noticed that after the feed > >> stopped there was a defunct process with the PPID listed as the PID of > >> our NEXRAD2 feed (output from ps -ef for the PID 1228948): > >> > >> dbndev 532636 1228948 0 0:00 <defunct> > >> dbndev 1228948 1028176 0 Oct 13 - 24:21 rpc.ldmd -v -q > >> /usr/ldm/data/ldm.pq /usr/ldm/etc/ldmd.conf > >> > >> Any ideas? > > > > This is extremely puzzling because upstream LDM processes don't call > > fork(2) -- so they can't have child processes. > > > > grep(1) the LDM logfiles to verify that the PID is that of an upstream > > LDM, e.g. > > > > fgrep '[1228948]' `ls -rt logs/ldmd.log*` > > > > Here is some output from the grep: > > Oct 18 14:15:01 b2n1 140.90.85.102[1228948] ERROR: Terminating due to LDM > failure; Connection to upstream LDM closed > Oct 18 14:15:01 b2n1 140.90.85.102[1228948] NOTE: LDM-6 desired > product-class: 20051018141401.214 TS_ENDT {{NEXRAD2, ".*"},{NONE, > "SIG=a239ff9ff6fa47cb8ab19f7c5e476ae1"}} > Oct 18 14:16:17 b2n1 140.90.85.102[1228948] ERROR: Terminating due to LDM > failure; Couldn't connect to LDM on 140.90.85.102 using either port 388 or > portmapper; : RPC: Remote system error - A remote host did not respond within > the timeout period. > Oct 18 14:16:18 b2n1 140.90.85.102[1228948] NOTE: LDM-6 desired > product-class: 20051018141401.214 TS_ENDT {{NEXRAD2, ".*"},{NONE, > "SIG=a239ff9ff6fa47cb8ab19f7c5e476ae1"}} > Oct 18 14:16:18 b2n1 140.90.85.102[1228948] NOTE: Product reclassification by > upstream LDM: 20051018141401.214 TS_ENDT {{NEXRAD2, ".*"},{NONE, > "SIG=a239ff9ff6fa47cb8ab19f7c5e476ae1"}} -> 20051018141401.214 TS_ENDT > {{NEXRAD2, ".*"}} > Oct 18 14:16:18 b2n1 140.90.85.102[1228948] NOTE: Upstream LDM-6 on > 140.90.85.102 is willing to be a primary feeder > Oct 18 14:54:28 b2n1 140.90.85.102[1228948] NOTE: Going verbose > Oct 18 14:54:29 b2n1 140.90.85.102[1228948] INFO: 9699 20051018145340.836 > NEXRAD2 382027 L2-BZIP2/KBMX/20051018145001/382/27 [snip] The above messages indicate, conclusively, that process 1228948 was a downstream LDM and not an upstream LDM. This is equally puzzling because downstream LDMs don't call fork() either -- and so can't have child processes. More relevant, however, is your suggestion that process 1228948 was an upstream LDM when it clearly wasn't. Would you please explain this discrepancy. > The LDM system that feeds us is restarted twice a day, that's why there > is a connection failure ~14:15. At 14:54 I sent the 1228948 process a > USR2 to go into verbose mode, once data stopped being received by the > upstream LDM we attached truss. > > Again, this only seems to happen when the upstream ldm is in verbose > mode. This process ran for 5 days in silent mode with no problems but > stopped after 3 hours once it was put into verbose. Hmm... That information might help. I'll need to know, however, whether to look at the upstream or downstream LDM code. > Thanks for continuing to look at this, Thank you for bringing this up and continuing to work with me. > Justin Regards, Steve Emmerson