[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #GUQ-669331]: Both Radar and Satellite feeds stopped downloading abruptly
- Subject: [LDM #GUQ-669331]: Both Radar and Satellite feeds stopped downloading abruptly
- Date: Thu, 26 Sep 2019 14:29:13 -0600
Hi John,
I logged onto rime this afternoon to do some more poking around.
This is what I observed in the order that I observed them:
1) the first thing I noted was the lack of any products being
received by the LDM on rime
I verified this with both 'ldmadmin watch' and 'notifyme -vl- -o 3600'.
2) I verified that the LDM REQUESTs to port 80 on node4.unidata.ucar.edu
were still active
I did this on both the rime and node4 sides.
3) I added a REQUEST for the full IDS|DDPLUS feed to the list of REQUESTs
already in place on rime, and then restarted rime's LDM
The result of this test was dramatic: the IDS|DDPLUS products were
immediately received, and their latencies soon dropped to essentially
zero and have stayed there:
http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?IDS|DDPLUS+rime.ttu.edu
At the same time, NO products from the other feed REQUESTs have been inserted
into the LDM queue. The reason that they have not been put in rime's LDM
queue is that their latencies all exceed the maximum latency parameter
specified in the LDM registry on rime.
4) Because IDS|DDPLUS products can be received on rime, and because
the size of the IDS|DDPLUS products are very small in comparison
to the size of the products in the other feeds being REQUESTed
(NEXRAD3, NGRID, NIMAGE and NOTHER), I immediately started to suspect
that there was some kind of "packet shaping" going on
The reason for thinking this is receipt of low volume feeds while high
volume feeds are not received is a "classic" symptom of some sort
of bandwidth limiting.
5) Because you stated categorically that everyone there says that the
problem is not in their systems (e.g., Learn, Internet2, and local network),
I decided to talk a look at the output from 'ifconfig' on rime
Bingo! The 'ifconfig' output shows that there is a problem with the
Ethernet interface that is being used on rime:
ldm@rime:~$ /sbin/ifconfig -a
em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 129.118.105.28 netmask 255.255.255.0 broadcast 129.118.105.255
ether d0:94:66:63:ea:79 txqueuelen 1000 (Ethernet)
RX packets 407915454 bytes 611274831415 (569.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 213318707 bytes 15095867996 (14.0 GiB)
TX errors 211896 dropped 0 overruns 0 carrier 211896 collisions
56797704
device memory 0x91b00000-91bfffff
Note the number of collisions that are being reported in this snapshot.
6) Because of the unexpected number of collisions being reported in 'ifconfig'
output, I grabbed my system administrator and asked him to take a look
Mike's comment was: If the Ethernet interface on rime is Gbps and is
running
at Gbps with full duplex, there should be no collisions reported.
7) This prompted us to look through the output from 'dmesg | less'
Here is the/a smoking gun:
[ 44.904382] igb 0000:01:00.0 em1: igb: em1 NIC Link is Up 100 Mbps Half
Duplex, Flow Control: RX/TX
[ 44.904389] igb 0000:01:00.0: EEE Disabled: unsupported at half duplex.
Re-enable using ethtool when at full duplex.
From the above, our conclusion is that one of the following is true:
- the Ethernet interface is running at 100 Mbps half duplex
This is BAD and should be corrected using 'ethtool' (run as 'root') as soon
as possible
- there is something wrong with the Ethernet port being used on rime
- there is something wrong with the Ethernet cable that is connecting rime
to the switch
- there is something wrong with the port on the switch that rime is connected
to
Since we do not have 'root' access on rime, we can not use 'ethtool' to
reset your Ethernet interface to what we think it can/should be.
Question:
Can you use 'ethtool' to reset the em1 Ethernet interface on rime and let
us know when the job is done?
If this does not correct the problem, can you check the Ethernet cable
that is being used to connect rime to the switch?
If this does not correct the problem, can you connect rime to a different
port on the switch?
If this doesn't work, can you switch to using the em2 Ethernet interface
on rime?
Cheers,
Tom
--
****************************************************************************
Unidata User Support UCAR Unidata Program
(303) 497-8642 P.O. Box 3000
address@hidden Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage http://www.unidata.ucar.edu
****************************************************************************
Ticket Details
===================
Ticket ID: GUQ-669331
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata
inquiry tracking system and then made publicly available through the web. If
you do not want to have your interactions made available in this way, you must
let us know in each email you send to us.