This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Manuel, > I have tried that from tigge-ldm and I get: > ldm@tigge-ldm:~> /usr/sbin/rpcinfo -n 388 -t tigge-ldm.ecmwf.int 300029 > rpcinfo: RPC: Timed out > program 300029 version 0 is not available Well, at least this is consistent with Dataportal not be able to connect to Tigge-ldm. You might run snoop(1) or tcpdump(1) in another window while you do this to diagnose the problem. > > Manuel, verify that any firewall rules on Tigge-ldm will allow incoming > > connections to port 388 from an arbitrary, remote port. > Last Monday, when a similar problem occurred, the only thing I did was > to restart LDM (remember I had to kill some LDM processes that were not > stopped gracefully by ldmadmin). This cleared the problem. > So I'm reluctant to think it is network related, but more likely a > process that is preventing those connections. It may have been a network > glitch that got it into this state, though. I think it's best if we discover the cause of the problem now to prevent it from reoccurring in the future. > If I 'ps -fu ldm' on both tigge-ldm and tigge-portal, I get different > results. On tigge-ldm: > UID PID PPID C STIME TTY TIME CMD > ldm 31408 1 0 Mar27 ? 00:00:00 vi stats.pl > ldm 18258 18252 0 Apr05 ? 00:00:00 sshd: ldm@pts/0 > > ldm 18259 18258 0 Apr05 pts/0 00:00:00 -bash > ldm 23339 23337 0 Apr10 ? 00:00:00 sshd: ldm@pts/1 > > ldm 23340 23339 0 Apr10 pts/1 00:00:00 -bash > ldm 30862 30860 0 Apr11 ? 00:00:00 sshd: ldm@pts/6 > > ldm 30863 30862 0 Apr11 pts/6 00:00:00 -bash > ldm 31903 1 0 Apr11 ? 00:04:13 pqact -f ANY -v -l > log/ldmd.log -p missing etc/tigge_pqact.conf I'm surprised that you're using pqact(1)'s "-l" option because that utility should log to the LDM log file by default. > ldm 31905 1 0 Apr11 ? 00:00:06 /usr/bin/perl > /usr/local/ldm/tigge/send > ldm 31906 1 0 Apr11 ? 00:00:12 rpc.ldmd -P 388 -v -q > /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf > ldm 31907 1 0 Apr11 ? 00:21:33 rpc.ldmd -P 388 -v -q > /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf > ldm 32091 1 0 Apr11 ? 00:08:01 rpc.ldmd -P 388 -v -q > /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf > ldm 32145 1 0 Apr11 ? 00:03:47 rpc.ldmd -P 388 -v -q > /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf > ldm 32147 1 0 Apr11 ? 00:07:41 rpc.ldmd -P 388 -v -q > /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf That's odd. The above indicates that 5 top-level LDM servers are running (the parent process ID for the LDM server is "1"; it's the PID of the LDM server for all upstream and downstream LDM child processes. This should not occur and indicates a serious problem. > ldm 21139 21137 0 Apr11 ? 00:00:00 sshd: ldm@pts/4 > > ldm 21140 21139 0 Apr11 pts/4 00:00:00 -bash > ldm 18695 1 0 Apr12 ? 00:00:59 rpc.ldmd -P 388 -v -q > /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf > ldm 22068 22066 0 09:35 ? 00:00:00 sshd: ldm@pts/2 > > ldm 22069 22068 0 09:35 pts/2 00:00:00 -bash > ldm 31904 1 0 Apr11 ? 00:03:13 rtstats -h > rtstats.unidata.ucar.edu > ldm 2507 22069 0 19:54 pts/2 00:00:00 ps -fu ldm > > while on tigge-portal: > UID PID PPID C STIME TTY TIME CMD > ldm 29317 1 0 Apr05 ? 00:00:00 rpc.ldmd -P 388 -v -m > 18000 -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/e > ldm 29318 29317 3 Apr05 ? 06:43:31 pqact -f EXP -p tigge > etc/pqact.conf_tigge > ldm 29321 29317 0 Apr05 ? 00:33:23 rpc.ldmd -P 388 -v -m > 18000 -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/e > ldm 29322 29317 0 Apr05 ? 00:33:11 rpc.ldmd -P 388 -v -m > 18000 -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/e > ldm 29323 29317 0 Apr05 ? 00:20:27 rpc.ldmd -P 388 -v -m > 18000 -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/e > ldm 29325 29317 0 Apr05 ? 00:20:10 rpc.ldmd -P 388 -v -m > 18000 -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/e > ldm 29326 29317 0 Apr05 ? 00:00:47 rpc.ldmd -P 388 -v -m > 18000 -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/e > ldm 29362 29317 0 Apr05 ? 00:01:38 [rpc.ldmd] <defunct> > ldm 31349 29317 0 Apr06 ? 00:00:00 [rpc.ldmd] <defunct> > ldm 31801 31799 0 Apr06 ? 00:00:00 sshd: ldm@pts/0 > > ldm 31802 31801 0 Apr06 pts/0 00:00:00 -bash > ldm 17254 17252 0 Apr10 ? 00:00:00 sshd: ldm@pts/2 > > ldm 17255 17254 0 Apr10 pts/2 00:00:00 -bash > ldm 30953 30951 0 10:14 ? 00:00:00 sshd: ldm@pts/3 > > ldm 30954 30953 0 10:14 pts/3 00:00:00 -bash > ldm 32552 30954 0 19:54 pts/3 00:00:00 ps -fu ldm > > > So on tigge-portal we have a master process rpc.ldmd (pid 29317) which > is the parent of all other rpc.ldmd processes. On tigge-ldm, all > rpc.ldmd don't show a parent, but init. Is this normal ? Definitely not! It might be the cause of your problem -- although I don't see exactly how. Can your netstat(1) show you PID-s? If so, then use it to discover which of the top-level LDM processes on Tigge-ldm are not listening on port 388 and kill those processes. These processes will have PID 1 as their parent PID and will be listening on ports other than 388. Regards, Steve Emmerson Ticket Details =================== Ticket ID: CUA-629523 Department: Support IDD TIGGE Priority: Normal Status: On Hold