[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[TIGGE #CUA-629523]: Re: dataportal not receiving data from tigge-ldm.ecmwf.int
- Subject: [TIGGE #CUA-629523]: Re: dataportal not receiving data from tigge-ldm.ecmwf.int
- Date: Fri, 14 Apr 2006 13:58:07 -0600
Manuel,
> That can be achieved with lsof(1), which lists the open file
> descriptors, including network sockets. This is the list of processes
> with port 'unidata-ldm' open:
>
> COMMAND PID USER FD TYPE DEVICE SIZE
> NODE NAME
> rpc.ldmd 31906 ldm 0u IPv4 21859518
> TCP *:unidata-ldm (LISTEN)
...
> rpc.ldmd 31907 ldm 0u IPv4 21859518
> TCP *:unidata-ldm (LISTEN)
...
> This shows two processes listening on port unidata-ldm: 31906 and 31907
Indeed it does show two processes listening on port 388 for TCP connections.
This should be impossible and probably indicates a problem with your
operating-system.
Because both processes are listening on port 388, this might be the cause of
your problem (I didn't see how because the O/S should prevent multiple
processes from listening on the same port for the same type of connection).
> This is the list of rpc.ldmd's (ps -fu ldm| grep rpc.ldmd):
> ldm 31906 1 0 Apr11 ? 00:00:12 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> ldm 31907 1 0 Apr11 ? 00:21:33 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> ldm 32091 1 0 Apr11 ? 00:08:01 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> ldm 32145 1 0 Apr11 ? 00:03:47 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> ldm 32147 1 0 Apr11 ? 00:07:41 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> ldm 18695 1 0 Apr12 ? 00:00:59 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
>
>
> So just to confirm, you want me to kill the following processes:
> 32091 32091 32145 32147 18695
PID 31091 is listed twice.
At this point I'm not sure which LDM server process should continue to run. I
suggest executing the "ldmadmin stop" command and then manually sending a
SIGTERM signal to any and all remaining LDM process. If they don't terminate,
then try a SIGINT and, finally, a SIGKILL.
Once that's done, do a "ldmadmin clean" to cleanup. Then execute a "pqcheck
-v" to check the product-queue for corruption. Recreate the product-queue if
necessary.
Then, very carefully, execute an "ldmadmin start" and see if it creates
multiple top-level LDM servers (it shouldn't).
Is there an EXEC entry in the LDM configuration-file (etc/ldmd.conf) that
starts another LDM?
Keep me apprised.
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: CUA-629523
Department: Support IDD TIGGE
Priority: Normal
Status: On Hold