This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Jeff, > Hey folks, > > Just short circuiting things a little bit, since Art won't be in until the > morning and this all appears to be my fault.. :) No problem -- I wasn't working late into the night as you were last night! > The folks at PSC (John Heffner) discovered that our firewall (a Cisco ASA) was > mangling the sequence numbers of TCP packets that passed through it. > Unfortunately, it was present in sniffer traces that were right in front of me > on several previous occasions but I missed it. When there was packet loss, the > rewritten sequence numbers were playing havoc with the linux SACK > implementation. We didn't see it between PSU and PSC because there was never > any packet loss on the segment, but on the longer path between PSU and NCAR, > the packet loss caused problems. I'm glad you found the problem > Anyway, according to Cisco, this is known (and default!) behavior for the > PIX/ASA OS, since they're "randomizing" the TCP sequence number to prevent > attackers from hijacking a TCP session. It's a simple thing to turn off off > the > sequence number rewriting, so I did that and now we're seeing a lot of > improvement on transfer to Art's machines. I'm glad to see we can squeeze more than 3Mbps through a 1Gbps end-to-end connect > I'm still not sure what's up with iperf to yakov, since that hasn't improved > very much after the changes, however we seem to be able to get more than > 100Mb/s to goeswest. yakov is having other problems which I still haven't figured out. As I mentioned to Art yesterday, we used yakov for a LDM throughput test to Europe several months back and pulled in 70+Mbps for a week and that put quite a load on the system. That was when yakov was FC4 (it's now FC5). Even then, it didn't perform nearly as well as I would expect. I don't know if the problem is the hardware (Dell 670), the Intel Pro 1000 chipset or what, but we have several of these systems with differing OS'es loaded and the throughput is poor on all of them. > On the TCP tuning issue, in my travels I found this site from LBL that was > very > helpful. It touches on some of the issues that were mentioned in some of the > email that Art forwarded to me, so I thought I'd pass it back along in case > it's helpful. > > http://www-didc.lbl.gov/TCP-tuning/linux.html > > The bits on the 2.6 kernel are particularly interesting, since Redhat is still > shipping 2.6.9 in RHEL 4. I haven't done any digging to see if they have > backported any of the listed patches or not. I'll look at the information closely. The IDD cluster director and real servers are currently all FC3 (2.6.11) systems and all seem to work real well > One other silver lining of this whole event is that we seem to have built some > steam behind finishing PSU's NLR connection, so that may happen sooner rather > than later. That's the best news of all! We're already sending a lot of IDD traffic on NLR. > Thanks for the help. We'll be keeping an eye on things to make sure that this > is truly the fix, but things look good so far.. > > -JEff Glad to help... mike Ticket Details =================== Ticket ID: TIG-389071 Department: Support IDD Priority: Normal Status: Closed