[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Support #HDQ-517625]: Assistance requested for "Gap in packet sequence" log entries from noaaportIngester
- Subject: [Support #HDQ-517625]: Assistance requested for "Gap in packet sequence" log entries from noaaportIngester
- Date: Thu, 18 Jun 2020 13:52:50 -0600
Hi Gregg,
re:
> Glad you received the attachment. What you see listed in the log file/s
> are correct.
OK. The first thing that jumps out at me in the *.log.1 log files is
the numbers of Gap messages for the non-polarsat feeds are not terrible
while the numbers in the polarsat file ARE terrible.
What do I mean by terrible?
- the number of missed frames in each Gap message is large
This is in stark comparison to the number of missed frames in the
Gap messages from the other log files which vary from 1 to 7 with
the majority being 1, 2 or 3.
- ignoring the polarsat log files for the time being, and concentrating
on the *.log.1 log files (which go from 0 UTC to about 14:22 UTC
There were a small number of error "events":
01:36:12 - from goes.log.1
01:37:05 - 01:37:06 - from nwstg.log
01:39:48 - 01:39:53 - from goes.log.1, nwstg2.log.1
06:23:58 - from goes.log.1, nwstg2.log.1 and nwstg.log.1
When Gap messages are clustered in time and independent of the
channel (PID), it indicates that there was a source of noise
causing the problem. "Noise" is better referred to as Terrestrial
Interference (TI).
The number of Gap messages and associated missed frames that were
reported in the *.log.1 files for the 14 hour and 22 minute interval
represented by the files is really not terrible. It is not perfect,
but it is not that bad.
- on to the polarsat log files
The two log files you forwarded show VERY BAD quality in this
channel for two reasons:
- there are a LOT of them
- the number of missed frames in each Gap is high
It doesn't make any sense to me (at the current moment, at least) that
the quality in one of the channels should/could be significantly worse
than all other channels. Because of this, my attention is being turned
to trying to figure out if the 'noaaportIngester' invocation for this
channel is the culprit, or, at least, if a change would result in better
ingest quality overall.
re:
> As a FYI: here are the Gap errors from the SPC operational AWIPS system
> ingesting SBN data from a different NOVRA box (as you can see hardly any
> Gap errors):
Unsaid in this statement, but which I am assuming is that the SPC operational
Novra and "your" Novra are being fed from the same dish. Please let me know
if this is or is not correct.
> -bash-4.2$ ll *log
>
> -rw-rw----. 1 ldm fxalpha 281883 Jun 18 17:57 edexBridge.log
> -rw-rw----. 1 root fxalpha 13748712 Jun 18 17:57 goes_add.log
> -rw-rw----. 1 root fxalpha 831137533 Jun 18 17:57 ldmd.log
> -rw-rw----. 1 root fxalpha 221082126 Jun 18 17:57 nwstg2.log
> -rw-rw----. 1 root fxalpha 402928198 Jun 18 17:57 nwstg.log
> -rw-rw----. 1 root fxalpha 73177811 Jun 18 17:57 oconus.log
> -rw-rw----. 1 root fxalpha 2300083 Jun 18 17:57 polarsat.log
> -rw-r--r--. 1 ldm fxalpha 1065 Jun 18 15:03 scour.log
Hmm... There are a couple of things that are jumping out at me
in this listing:
- you are using the LDM scour utility to scour the ingest log
files
We do NOT use the LDM scouring to maintain our ingest log files.
We use the shell script 'nplog_rotate' that can be found in the
~ldm/bin directory to rotate these log files. Moreover, since
one's local setup may differ from our setup, we recommend that
users copy this script to a different directory that is in the
PATH of the user running the LDM, and execute the (modified if
necessary) script from that directory.
All of our LDM systems have two directories that we use to organize
useful executables and scripts:
~ldm/util
~ldm/decoders
We copy things like 'nplog_rotate' to our ~ldm/util directory
and adjust our cron entry to run this script. The reason for
doing the copy is we don't have to edit the script after each
new LDM installation and re-modify values inside.
For reference, here is our crontable entry that is used
to run the script and rotate the ingest log files:
#
# Rotate NOAAPort ingest logs
#
0 0 * * * util/nplog_rotate 30 > /dev/null 2>&1
NB: our clock runs in UTC, so this entry runs at 00:00 UTC.
- the other thing that jumps out at me is the existence of the
edexBridge.log log file
Does this mean that you are doing your NOAAPort ingest on the
same machine on which you are running AWIPS/EDEX?
If yes, a red flag just started waving in front of my eyes
since this is a non-standard AWIPS use. Standard use is
a dedicated machine to do the ingest which feeds to one
or more downstream machines that are running AWIPS/EDEX or
some other data decoding.
re:
> -bash-4.2$ grep Gap g*log n*log o*log p*log
>
> nwstg2.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19886] WARN:
> Gap in packet sequence: 100012352 to 100012355 [skipped 2]
> nwstg2.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19886] WARN:
> Gap in packet sequence: 100012368 to 100012370 [skipped 1]
> nwstg2.log:Jun 18 13:45:16 cpsbn1-spcn journal: noaaportIngester[19886] WARN:
> Gap in packet sequence: 119453908 to 119453912 [skipped 3]
>
> nwstg.log:Jun 18 06:23:28 cpsbn1-spcn journal: noaaportIngester[19884] WARN:
> Gap in packet sequence: 34463218 to 34463223 [skipped 4]
> nwstg.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19884] WARN:
> Gap in packet sequence: 34469382 to 34469387 [skipped 4]
> nwstg.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19884] WARN:
> Gap in packet sequence: 34469417 to 34469421 [skipped 3]
> nwstg.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19884] WARN:
> Gap in packet sequence: 34469445 to 34469451 [skipped 5]
>
> polarsat.log:Jun 18 06:23:57 cpsbn1-spcn journal: noaaportIngester[19888]
> WARN: Gap in packet sequence: 3935518 to 3935525 [skipped 6]
> polarsat.log:Jun 18 06:23:57 cpsbn1-spcn journal: noaaportIngester[19888]
> WARN: Gap in packet sequence: 3935599 to 3935603 [skipped 3]
> polarsat.log:Jun 18 06:23:58 cpsbn1-spcn journal: noaaportIngester[19888]
> WARN: Gap in packet sequence: 3935762 to 3935768 [skipped 5]
> polarsat.log:Jun 18 13:42:41 cpsbn1-spcn journal: noaaportIngester[19890]
> WARN: Gap in packet sequence: 14317950 to 14317954 [skipped 3]
These all look normal in the number of Gap messages being reported and the
number of missed frames being reported in each Gap message.
re:
> I'll work with my coworkers on your suggestion and getting you more
> specifics.
OK, thanks.
Like I said above, my attention is now focused on your 'noaaportIngester'
invocation for the polarsat data since it is THE feed that is having
the BIG problems. Are you willing to change this invocation?
re:
> Here is the info from ipconfig, where em2 is connected to the NOVRA box:
>
> [ldmcp@sbn1 ~/logs]$ ifconfig -a
>
> em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
> inet 140.90.173.123 netmask 255.255.255.0 broadcast 140.90.173.255
> ether 84:2b:2b:4e:0d:0f txqueuelen 1000 (Ethernet)
> RX packets 69037645 bytes 93187995648 (86.7 GiB)
> RX errors 0 dropped 132588 overruns 0 frame 0
> TX packets 17641981 bytes 3778381561 (3.5 GiB)
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>
> em2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
> inet 10.0.5.50 netmask 255.255.255.0 broadcast 10.0.5.255
> ether 84:2b:2b:4e:0d:10 txqueuelen 1000 (Ethernet)
> RX packets 1196657513 bytes 1640531604592 (1.4 TiB)
> RX errors 0 dropped 0 overruns 0 frame 0
> TX packets 9862 bytes 1039439 (1015.0 KiB)
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>
> lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
> inet 127.0.0.1 netmask 255.0.0.0
> loop txqueuelen 1000 (Local Loopback)
> RX packets 934 bytes 61650 (60.2 KiB)
> RX errors 0 dropped 0 overruns 0 frame 0
> TX packets 934 bytes 61650 (60.2 KiB)
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
The output for 'em2' looks good. The lack of RX and TX errors for the
period covered by the RX 1.4 TiB indicates that there is no problem
with the 'em2' Ethernet interface.
The small number of Gap messages for all channels but polarsat indicates
that there is nothing wrong with the Ethernet cable connecting the
Novra S300N to your machine. It also indicates that your system is
working OK even though your C/N is in the 15s. This is in alignment
with the comment I made about the Northrup Grumman ingest quality being
good even though their C/N was around 11.7.
Given the above, I think we have narrowed the place to look for
problems down to the 'noaaportIngester' EXEC line in your LDM
configuration file, ~ldm/etc/ldmd.conf:
If I were you, I would try:
- remove the '-c' flag from each 'noaaportIngester' EXEC line
- remove the '-r 1' flag from each noaaportIngester' EXEC line
- strongly consider moving away from using the system logging
daemon for logging and use the new logging available in
current versions of the LDM
Use of the new LDM logging is the default. You should have had
to build the LDM specifying to use the system logging daemon
to keep using the system logging daemon. If I am correct in this,
you will need to rebuild your LDM using defaults:
<as 'ldm' or the user running your LDM>
cd ~ldm/ldm-6.13.11/src
make distclean
./configure --with-noaaport > configure.log 2>&1
ldmadmin stop
make install > makeinstall.log 2>&1
ldmadmin start
The 'configure' and 'make install' lines above assume that you
have 'root' or 'sudo' capability. If you do not, then you would
need to instead run:
./configure --with-noaaport --disable-root-actions > configure.log 2>&1
ldmadmin stop
make install > makeinstall.log 2>&1
Then:
<as 'root'>
cd ~ldm/ldm-6.13.11/src
make root-actions
<as 'ldm'>
ldmadmin start
Cheers,
Tom
--
****************************************************************************
Unidata User Support UCAR Unidata Program
(303) 497-8642 P.O. Box 3000
address@hidden Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage http://www.unidata.ucar.edu
****************************************************************************
Ticket Details
===================
Ticket ID: HDQ-517625
Department: Support NOAAPORT
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata
inquiry tracking system and then made publicly available through the web. If
you do not want to have your interactions made available in this way, you must
let us know in each email you send to us.