[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #NGQ-500529]: ldm questions

Subject: [LDM #NGQ-500529]: ldm questions
Date: Tue, 06 Oct 2009 16:04:04 -0600
Hi Paul (and Brian, Tim, and Vic),

Several of us have just finished discussing your inquiry, and this
reply is what we all agree on:

Paul asked:
> A side question. Considering the behavior I'm seeing, do you think
> changing the feedset for the grb2 data might help?

No.  The feedtype used should have no effect on the problems you
are experiencing.

Paul asked:
> For our setup we do not see any data on the NMC feedset and have
> considered using it as the sole source for the grb2 data. Thoughts?

If your setup does not interact with the Unidata IDD, then you are
free to use any feedtype you find convenient.  However, if there is
any possibility of your LDM interacting directly or indirectly
with the IDD, then we ask that you do not use a feedtype whose
content is already defined by existing use.  The feedtypes open
for use are currently EXP and SPARE.

NB: the NMC feedtype is a "compound" feedtype: it is the union of
the AFOS, NMC2, and NMC3 (raw feedtypes FT11, FT12 and FT13).  It
is best to use a single feedtype when creating your own datastream.

Steve asked if there is more than one REQUEST entry in the LDM
configuration-file on Ldad.  The ldmd_ldad.conf file you sent as
an attachment to one of your emails answered this question:

- you have multiple ldmd.conf REQUEST lines for different kinds of
  data you are putting in the EXP feedtype

- all of the GFS 0.5 degree data (.grb2) is being requested on a
  single ldmd.conf REQUEST line

Comment:

It is our belief that the firewall between your ldad and arps LDM
is causing the slowdown in receipt of GRIB2 products in your EXP
data feed.  The firewall could be throttling the connection based
on the high volume of data in that connection, or the firewall
could be performing "deep inspection" of each product (each
packet of each product) being sent.  In either case, the serialization
of the data products would lead to monotonically increasing
latencies.

From our perspective, there are two ways of addressing this:

- change the firewall configuration to stop the slowdown/deep
  inspection of the products

- split your ldmd.conf REQUEST for .grb2 products into several
  REQUESTs

Comments:

- the first option may not be possible/allowed due to security
  policies at JSC.  I would, nonetheless, pursue this with
  all possible vigor as it is the most straightforward way of
  solving the problem.

  Questions:

  - are both ldad and arps machines internal to JSC?
  - if yes, is there really a need to throttle the connection
    or deeply inspect each packet? 

- splitting the single REQUEST for .grb2 data would necessitate
  there being something in the product IDs that would allow
  the set of products to be separated into 'n' mutually exclusive
  subsets.

  We do this with the IDD CONDUIT feed by including a sequence number
  as a field in the product ID.  For example, here are the product IDs
  for three successive CONDUIT products:

data/nccf/com/nam/prod/nam.20091006/nam.t18z.grbgrd03.tm00.grib2 
!grib2/ncep/NAM_84/#000/200910061800F003/AVOR/500 Pa PRES! 000003
data/nccf/com/nam/prod/nam.20091006/nam.t18z.grbgrd03.tm00.grib2 
!grib2/ncep/NAM_84/#000/200910061800F003/AVOR/1000 Pa PRES! 000004
data/nccf/com/nam/prod/nam.20091006/nam.t18z.grbgrd03.tm00.grib2 
!grib2/ncep/NAM_84/#000/200910061800F003/CINS/0 - NONE! 000005

  The final field is a 6 sequence number that we use to subset
  the products into mutually exclusive subsets.

Since you are generating your own product IDs, it should be easy
to add a sequence number that allows for the same kind of sub-setting.
I suggest that you first try adding sequence a number that ranges from
000 to 009, and the restarts (i.e., 000, 001, ..., 009, 000, ...).
A 5-way split of your current single REQUEST could look like:

change:

request EXP     ".*.grb2"       198.122.138.134  <- NB: the leading .* is not 
needed

to:

request EXP     "\.grb2.*[09]$"    198.122.138.134
request EXP     "\.grb2.*[18]$"    198.122.138.134
request EXP     "\.grb2.*[27]$"    198.122.138.134
request EXP     "\.grb2.*[36]$"    198.122.138.134
request EXP     "\.grb2.*[45]$"    198.122.138.134

The idea behind splitting the single REQUEST into multiple
REQUESTs is to drop the volume of each REQUEST stream
which may be enough to fall "under the radar" of the
firewall as it is currently setup (if volume is triggering
a throttling by the firewall), or to have the firewall
work on more, lower volume streams simultaneously (if the
firewall is doing deep inspection).

If a 5-way split does not drop the latencies enough, you could
do a 10-way split:

request EXP     "\.grb2.*0$"       198.122.138.134
request EXP     "\.grb2.*1$"       198.122.138.134
request EXP     "\.grb2.*2$"       198.122.138.134
request EXP     "\.grb2.*3$"       198.122.138.134
request EXP     "\.grb2.*4$"       198.122.138.134
request EXP     "\.grb2.*5$"       198.122.138.134
request EXP     "\.grb2.*6$"       198.122.138.134
request EXP     "\.grb2.*7$"       198.122.138.134
request EXP     "\.grb2.*8$"       198.122.138.134
request EXP     "\.grb2.*9$"       198.122.138.134

If splitting the feed helps but a 10-way split still does
not reduce latencies to acceptable levels, you will need
to create a larger sequence number (e.g., 000 to 099) and
split the feed further.

Paul wrote:

> while watching this yesterday I did an "ls -l" on the ldm.pq
> file and it indicated it was ~2.6GB. This was the indicated
> size prior to the 12Z model run. After watching the 12Z data
> I looked at the file again and it ndicated it was ~3.9GB. The
> default queue size defined in ldmadmin-pl.conf at the time was
> set to "1G".

Comments:

- we are totally at a loss to explain this behavior as it:

  - is not how things should work
  - has _never_ been reported by any site

- a LONG time ago (early LDM-5 days), the LDM queue could grow,
  but that was only on a machine running IRIX.  We have run
  the LDM-6 on IRIX for a _long_ time (several years) and have
  never seen it change size

Questions:

- what OS is running on your ldad machine (providing the
  output of 'uname -a' would be very helpful)?

- is your LDM queue on an NFS-mounted, RAID or local file system?

  We strongly recommend that the LDM queue be put on a local file
  system.  Under Solaris, putting the queue on a RAID seems to
  work fine, but we have found this to be problematic under Linux
  (I tested putting the LDM queue on both a software and hardware
  RAID under Fedora Linux a few years back.  My tests showed that
  product latencies would rapidly grow to the maximum latency
  allowed (default is 3600 seconds) even when feeding from internal
  machines in the UPC.  When I moved the queue off of the RAID and
  onto a local file system (ext2 or ext3), the latencies would drop
  to less than one second).

Paul wrote:

> We recently upgraded our AWIPS to OB9.1 and since that time we
> have been having problems getting our GFS 0.5 degree grid data
> from our MIDDS/McIDAS ldm

Comment:

- our guess is that the configuration of the firewall between your
  ldad and arps machines was changed at or about the same time
  that your AWIPS installation was upgraded

Paul wrote:

> As for particulars we are running ldm 6.6.5 on both sides.

Comment:

- the current LDM release is 6.8.1, but we do _not_ believe
  that this has anything to do with the problems you are
  experiencing

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: NGQ-500529
Department: Support LDM
Priority: Normal
Status: Closed
Prev by Date: [LDM #NGQ-500529]: ldm questions
Next by Date: [LDM #NGQ-500529]: ldm questions
Previous by thread: [LDM #NGQ-500529]: ldm questions
Next by thread: [LDM #NGQ-500529]: ldm questions
Index(es):
- Date
- Thread