[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20050928: LDM errors
- Subject: 20050928: LDM errors
- Date: Thu, 29 Sep 2005 08:42:04 -0600
>From: "Vehorn, Robert CIV SPAWARSYSCEN Charleston SC J672" <address@hidden>
>Organization: SPAWAR
>Keywords: 200509281938.j8SJcWG7002974 IDD-Antarctica LDM
Hi Bob,
re:
>Thanks to everyone who responded with suggestions about bandwidth
>usage. The network engineers will probably end up using Packeteer to
>slow us down, and we will have to implement compression for the larger
>products.
>The Antarctic-IDD is configured such that most sites are both producers
>and consumers of data.
>Correct me if I'm wrong, but I believe that the LDM software prevents
>data loops by rejecting products from an upstream source if that
>product already exists in it's queue.
Your statement is true, but I would like to expound on what does and can happen.
If your LDM queue already has a product with a particular MD5 checksum, a
product from an upstream will be rejected if it has the same MD5 checksum
as you say. Where the rejection occurs, however, will depend on the feed
state: if the connection is 'ALTERNATE', then your LDM will be asked if
the new product is wanted and if it is already in the queue, the answer will
be no. This option uses very little of your bandwidth. If, on the other
hand, your connection is 'PRIMARY'. the product will be sent to you and the
rejection will occur on your end. In this case bandwidth _is_ used since
the product is sent before the rejection. This difference is something that
has to be taken into consideration given the bandwidth considerations
at McMurdo.
>Another problem with the
>configuration is that some sites are running behind very strict
>firewalls, such that incoming LDM connections are not possible. These
>sites use 'pqsend' to push products downstream. I have 2 such machines
>at SPAWAR in Charleston, SC (SSCC), that need to send data to the
>top-level server at the University of Wisconsin. Here are the
>applicable lines from my config:
## exec
exec "pqsend -h ice.ssec.wisc.edu -f EXP"
## requests
REQUEST EXP ^USAP.(SSCC|NZCM) ice.ssec.wisc.edu PRIMARY
REQUEST EXP ^USAP.NCAR.GRIB.(D1|D2) ice.ssec.wisc.edu PRIMARY
>The server at UW has an 'accept' entry for us, and 'pqsend' is able to
>connect initially. The errors occur whenever the local LDM (SSCC)
>tries to send any data to the server at UW. Here is what the log looks
>like (process 22153 is 'pqsend'):
Sep 28 19:05:28 atslab-ldm rpc.ldmd[22150] NOTE: Starting Up (version: 6.4.1;
built: Aug 4 2005 22:47:06)
Sep 28 19:05:28 atslab-ldm rpc.ldmd[22150] NOTE: Using local address
0.0.0.0:388
Sep 28 19:05:28 atslab-ldm pqact[22151] NOTE: Starting Up
Sep 28 19:05:28 atslab-ldm rtstats[22152] NOTE: Starting Up (22150)
Sep 28 19:05:28 atslab-ldm ice.ssec.wisc.edu[22153] NOTE: Starting Up (22150)
Sep 28 19:05:28 atslab-ldm ice[22156] NOTE: Starting Up(6.4.1):
ice.ssec.wisc.edu:388 20050928180528.218 TS_ENDT {{EXP, "^USAP.NZCM"}}
Sep 28 19:05:28 atslab-ldm ice[22156] NOTE: LDM-6 desired product-class:
20050928180528.219 TS_ENDT {{EXP, "^USAP.NZCM"}}
Sep 28 19:05:28 atslab-ldm ice[22154] NOTE: Starting Up(6.4.1):
ice.ssec.wisc.edu:388 20050928180528.266 TS_ENDT {{EXP,
"^USAP.NCAR.GRIB.(D1|D2)"}}
Sep 28 19:05:28 atslab-ldm ice[22154] NOTE: LDM-6 desired product-class:
20050928180528.268 TS_ENDT {{EXP, "^USAP.NCAR.GRIB.(D1|D2)"}}
Sep 28 19:05:35 atslab-ldm ice[22154] NOTE: Upstream LDM-6 on
ice.ssec.wisc.edu is willing to be a primary feeder
Sep 28 19:05:35 atslab-ldm ice[22156] NOTE: Upstream LDM-6 on
ice.ssec.wisc.edu is willing to be a primary feeder
Sep 28 19:05:37 atslab-ldm ice.ssec.wisc.edu[22153] ERROR: ship: RPC: Remote
system error: 15780 20050928182001.257 EXP 000
USAP.NCAR.GRIB.D1.2005092812.F018.002M.MIXR
Sep 28 19:06:09 atslab-ldm ice.ssec.wisc.edu[22153] ERROR:
sign_on(ice.ssec.wisc.edu): can't contact portmapper: RPC: Timed out
Sep 28 19:06:25 atslab-ldm ice.ssec.wisc.edu[22153] ERROR: ship: RPC: Remote
system error:
89 20050928181601.081 EXP 000 USAP.SSCC.AWS.Z601.20050928.1814
Sep 28 19:07:05 atslab-ldm ice.ssec.wisc.edu[22153] ERROR:
sign_on(ice.ssec.wisc.edu): can't contact portmapper: RPC: Timed out
>I've seen the 'sign_on' errors before and assumed they occured because
>we were only sending data every 15 minutes and something had timed-out,
>but the 'Remote system error' just started to occur after the servers
>at UW were upgraded to version 6.4.1. Note that the first product that
>'pqsend' is trying to transfer is a grib file that was just received,
>and the second product was produced locally.
>Can anyone shed any light on what is causing these errors?
I am sending this along to our LDM developer, Steve Emmerson, for
comment/elucidation. It may be the case that you will have to
downgrade to LDM-6.3.0. Let's see what Steve has to say...
>Also, a question concerning regular expressions; which is better in
>terms of efficiency on the upstream server:
REQUEST EXP ^USAP.NCAR.GRIB.(D1|D2) ice.ssec.wisc.edu PRIMARY or
REQUEST EXP "USAP.NCAR.GRIB.(D1|D2).*" ice.ssec.wisc.edu PRIMARY
The first regular espression is best. The second one's inclusion of a
'.*' at the end is not needed as it is assumed. Steve calls this type
of regular expression pathological.
>Thanks again,
Again, Let's see what Steve has to say about the pqsend problem you are seeing.
>Bob Vehorn
>Aviation Techincal Services
>SPAWAR Systems Center Charleston
>Charleston, SC 29406
>843-218-6193
Cheers,
Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas UCAR Unidata Program *
* (303) 497-8642 (last resort) P.O. Box 3000 *
* address@hidden Boulder, CO 80307 *
* Unidata WWW Service http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+