[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[TIGGE #CUA-629523]: Re: dataportal not receiving data from tigge-ldm.ecmwf.int
- Subject: [TIGGE #CUA-629523]: Re: dataportal not receiving data from tigge-ldm.ecmwf.int
- Date: Fri, 14 Apr 2006 15:12:21 -0600
Manuel,
> This is the output of "lsof | egrep 'PID|unidata'" on tigge-portal:
> COMMAND PID USER FD TYPE DEVICE SIZE
> NODE NAME
> rpc.ldmd 29317 ldm 0u IPv4 1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd 29317 ldm 1u IPv4 1168838
> TCP
> tigge-portal.ecmwf.int:unidata-ldm->tigge-ldm.ecmwf.int:45328 (CLOSE_WAIT)
> rpc.ldmd 29321 ldm 0u IPv4 1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd 29321 ldm 4u IPv4 2421682
> TCP
> tigge-portal.ecmwf.int:48653->tigge-ldm.ecmwf.int:unidata-ldm (ESTABLISHED)
> rpc.ldmd 29322 ldm 0u IPv4 1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd 29322 ldm 3u IPv4 3064475
> TCP
> tigge-portal.ecmwf.int:55991->tigge-ldm.ecmwf.int:unidata-ldm (SYN_SENT)
> rpc.ldmd 29322 ldm 4u IPv4 2421860
> TCP
> tigge-portal.ecmwf.int:48659->tigge-ldm.ecmwf.int:unidata-ldm (ESTABLISHED)
> rpc.ldmd 29323 ldm 0u IPv4 1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd 29323 ldm 3u IPv4 3064477
> TCP
> tigge-portal.ecmwf.int:55992->tigge-ldm.ecmwf.int:unidata-ldm (ESTABLISHED)
> rpc.ldmd 29325 ldm 0u IPv4 1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd 29325 ldm 3u IPv4 3064474
> TCP
> tigge-portal.ecmwf.int:55990->tigge-ldm.ecmwf.int:unidata-ldm (SYN_SENT)
> rpc.ldmd 29326 ldm 0u IPv4 1017103
> TCP *:unidata-ldm (LISTEN)
> rpc.ldmd 29326 ldm 4u IPv4 2421808
> TCP
> tigge-portal.ecmwf.int:48657->tigge-ldm.ecmwf.int:unidata-ldm (ESTABLISHED)
>
> All rpc.ldmd do listen on port 388. And this is because when a process
> fork(2) another process, the child inherits the open file descriptors of
> the parent process. This is normal behaviour.
One of the very first things a child LDM process does is to close the listening
socket (see "server/ldmd.c"; search for "fork()") Therefore, you should never
see what you did see unless something is very wrong, in my opinion.
Also, the ps(1) output you sent showed multiple, top-level LDM servers. While
not impossible, this also shouldn't happen.
The netstat(1) utility on one of our Linux systems has a "-p" option that
prints the PID. Can you verify multiple LDM listeners using that utility?
> I suppose the one with the lowest PID. I have been digging in logs, and
> this is the extract of the logfiles when I last started LDM:
>
> Apr 11 08:46:04 tigge-ldm rpc.ldmd[31899] NOTE: Starting Up (version:
> 6.4.5.1; built: Jan 23 2006 22:38:02)
> Apr 11 08:46:04 tigge-ldm rpc.ldmd[31899] NOTE: Using local address
> 0.0.0.0:388
> Apr 11 08:46:04 tigge-ldm pqact[31903] NOTE: Starting Up
> Apr 11 08:46:04 tigge-ldm rtstats[31904] NOTE: Starting Up (31899)
> Apr 11 08:46:04 tigge-ldm tigge-portal[31907] NOTE: Starting
> Up(6.4.5.1): tigge-portal.ecmwf.int:388 20060411074604.938 TS_ENDT {{A
> NY, "\.missing$"}}
> Apr 11 08:46:04 tigge-ldm dataportal[31906] NOTE: Starting Up(6.4.5.1):
> dataportal.ucar.edu:388 20060411074604.938 TS_ENDT {{ANY,
> "\.missing$"}}
> Apr 11 08:46:04 tigge-ldm pqact[31903] INFO: Successfully read
> configuration-file "etc/tigge_pqact.conf"
> Apr 11 08:46:05 tigge-ldm pqact[31903] INFO: TS_ZERO TS_ENDT {{ANY,
> "missing"}}
> Apr 11 08:46:05 tigge-ldm pqact[31903] INFO: 0 20060411084605.347
> ANY 000 _BEGIN_
> Apr 11 08:46:05 tigge-ldm dataportal[31906] INFO: No matching
> data-product in product-queue
> Apr 11 08:46:05 tigge-ldm tigge-portal[31907] INFO: No matching
> data-product in product-queue
> Apr 11 08:46:05 tigge-ldm dataportal[31906] NOTE: LDM-6 desired
> product-class: 20060411074605.349 TS_ENDT {{ANY, "\.missing$"}}
> Apr 11 08:46:05 tigge-ldm tigge-portal[31907] NOTE: LDM-6 desired
> product-class: 20060411074605.349 TS_ENDT {{ANY, "\.missing$"}}
> Apr 11 08:46:05 tigge-ldm dataportal[31906] INFO: Connected to upstream
> LDM-6 on host dataportal.ucar.edu using port 388
> Apr 11 08:46:05 tigge-ldm dataportal[31906] NOTE: Upstream LDM-6 on
> dataportal.ucar.edu is willing to be a primary feeder
> pqinsert INFO: 9205744 20060411084605.849 EXP 000
> z_tigge_c_ecmf_20060410120000.manifest
> Apr 11 08:46:06 tigge-ldm rpc.ldmd[31899] INFO: RPC buffer sizes for
> dataportal.ucar.edu: send=16384; recv=87380
> Apr 11 08:46:06 tigge-ldm dataportal[31913] INFO: Connection from
> dataportal.ucar.edu
> pqinsert INFO: 428963 20060411084606.439 EXP 000
> z_tigge_c_ecmf_20060410120000_0001_pf_pl_0090_002_0600_u.grib:88065
> pqinsert INFO: 428963 20060411084606.484 EXP 000
> z_tigge_c_ecmf_20060410120000_0001_pf_pl_0090_002_0600_v.grib:88066
>
>
> After all this information, what do you want me to do ? Do you still
> want me to go ahead with:
> ldmadmin stop
> kill remaining
> ldmadmin clean
> pqcheck -v
> check everything is gone
> ldmadmin start
Try using netstat(1) to verify multiple listeners. Then, stop everything,
restart, and see if you get multiple top-level LDM-s again.
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: CUA-629523
Department: Support IDD TIGGE
Priority: Normal
Status: On Hold