[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Datastream #IZJ-689237]: Additional Datafeeds

This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.

Subject: [Datastream #IZJ-689237]: Additional Datafeeds
Date: Fri, 17 Oct 2008 08:28:31 -0600

Hi Jeff,

re: file over 4 GB in size
> Why would a single day's file grow to be so large anyway?

For things like METARs, ship reports, etc. they shouldn't be.  The unusual
size indicates a problem with the decoding.

re: doing a file clean-up and starting fresh
> I'm still in favor of doing this.  That's why I said in a previous
> message that I was thinking of deleting the data directory and
> starting over.

Yes, I understood that.  At the time, I did not think that starting fresh
was absolutely necessary.  I now do think it is necessary, but probably not
in the way that you had originally thought.

After looking at things last night, I noticed that three of the 5.11.1 decoders
were routinely producing huge output files:

dcacft
dcmetr
dcmsfc

Because of this, I renamed those decoders in ~ldm/decoders and copied over
the 5.10.4 versions and then restarted the LDM.

A quick look this morning shows that the 5.10.4 decoders have not (yet)
produced ridiculously sized files.  My conclusion is that the pre-compiled
5.11.1 binaries do not all run correctly on your CentOS system.

My proposal at this point is to build the 5.11.1 distribution from source code.
This will undoubtedly uncover missing packages that GEMPAK routines need to link
against which will, in turn, require 'root' privilege to install.

> I'm going to talk to the Chair of the dept. and see if I can get the go-ahead
> to do it this afternoon so that it has time to build a couple of days worth
> of data over the weekend.  The biggest problem that I have with doing this is
> disk space. /var/data is running @ 93% full now.

I would think that a clean build from source and a clean set of decoded files
produced from the newly built decoders will result in a system that runs 
correctly
and so will be desirable.

> I thought about cranking scour down to like one day on everything, so that it
> would clear out more of the old stuff at least.  That might buy me enough 
> space
> to try the new data directory out before I actually delete the .old directory.

I agree.  Data scouring is another subject that I wanted to bring up as a topic
of discussion, but I was leaving this until after everything was running 
smoothly.

re: I deleted zero-length and huge files to see if that would help
> Hopefully that will help, but I'm still more in favor of a clean slate.

Yes, with a distribution built on your machine.  Our experience is that a large
fraction of the support email we get is related to inconsistencies found in
running pre-compiled binaries on systems that are different from the platforms
we use for builds here in the UPC.  The incorrect functioning of decoders on
whistler fits in this category.  Sites that have dropped back and built the
distribution from source have seen those weird problems go away.  It is because
of this observation that we will be:

- making fewer binary distributions of GEMPAK available
- recommending that sites build GEMPAK from source (like the LDM and McIDAS)

re: the clock on whistler is in UTC

> Yeah, that's another thing that was done before I got here.  I'm not a
> weather person, so I'm not sure why the fascination with UTC. :-)

Using a common time is vital in atmospheric science.  It is the only way
that times of events can be compared easily.  Running the machine in UTC,
however, is not needed.  Again, the OS always runs in UTC.  Having the
time displayed for the user and in which cron runs is not needed.  Also,
the example/recommended cron invocations are all set for local time, not
UTC. (But I changed the entries in 'ldm's cron yesterday so that things
like log file rotation would happen at appropriate times (like just after
0Z).

re: high product latencies on whistler
> All I got back so far is, "Whistler falls into its own traffic class".
> I asked for some clarification as to whether that meant that it was being
> throttled or not.  I haven't heard back yet.

I separated the IDS|DDPLUS feed into its own separate request yesterday
evening in order to get a baseline that can be used to evaluate whether
or not packet shaping is being done.  It is too early to be sure if there
is packet shaping, but the early indications are that there is some (the
latencies for IDS|DDPLUS are now at essentially zero while the latencies
in HDS and CONDUIT still show large values):

IDS|DDPLUS
http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?IDS|DDPLUS+whistler.creighton.edu

HDS
http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?HDS+whistler.creighton.edu

CONDUIT
http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+whistler.creighton.edu

re: way to test for packet shaping
> How would we go about doing that?

I made the following change yesterday evening at 23:05 UTC:

change:

request UNIDATA ".*" idd.unl.edu

to:

request IDS|DDPLUS|UNIWISC ".*" idd.unl.edu
request HDS ".*" idd.unl.edu

The IDS|DDPLUS and UNIWISC datafeeds are both low volume.  IDS|DDPLUS has
lots of products are small, and UNIWISC has few products which are larger.
If/when a combined IDS|DDPLUS|UNIWISC feed shows low latencies while other,
higher volume feeds show high latencies, the indication is that there could
be some volume-based packet shaping.  The early indications are that there
might be for whistler.

> I'd be interested in that, if for no other reason than to learn a little
> more about datafeeds and possibly help with troubleshooting down the road.

Please let me know if the explanation above is clear enough.

> Along these same lines - and related to the original question that drug you
> into this whole mess - is it a good idea to have "backup" feeds available?

Yes, absolutely.

> A few weeks ago we lost the idd.unl.edu feed for a day or so, for some still
> unknown reason.  I didn't have a backup feed for that data, so we went without
> a big swath of data for a while.  On the positive side, it was nice in that by
> the time it came back, scour had us down to 43% disk usage on /var/data :-)

Right now we have the College of DuPage listed as your designated backup:

http://www.unidata.ucar.edu/software/idd/sitelist.html

It would seem to me that you could use the University of Wisconsin machine,
f5.aos.wisc.edu, as your backup for all of your non-restricted data feeds
(NLDN is distributed point-to-point only).

So, are you game for building GEMPAK from source?  I think that this is an 
important
step to get your system functioning properly.  I don't want to sugarcoat this, 
however:
it will require upfront effort in installing packages as 'root' (using 'yum') 
that
are needed.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: IZJ-689237
Department: Support IDD
Priority: Normal
Status: Closed

Prev by Date: [Datastream #IZJ-689237]: Additional Datafeeds
Next by Date: [Datastream #IZJ-689237]: Additional Datafeeds
Previous by thread: [Datastream #IZJ-689237]: Additional Datafeeds
Next by thread: [Datastream #IZJ-689237]: Additional Datafeeds
Index(es):
- Date
- Thread