This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Jamie, This is a semi-quick follow-up to the reply that Steve just sent you. Steve mused: > This might indicate a problem with our Linux Virtual Server (LVS) > implementation > (idd.unidata.ucar.edu is actually a cluster of computers served by LVS). I, for one, do not think that the situation you experienced has anything to do with the LVS that directs the idd.unidata.ucar.edu cluster, but that is a discussion that is ongoing here in Unidata. We just worked through a situation at LSU that was very similar (if not exactly the same) as what you experienced today. The LSU situation was diagnosed and fixed during a conference call that we had last Friday; the participants in the call were two of us here in the UPC, one person from LSU/SRCC, one LSU networking admin, and a representative from Juniper Networks (the company whose edge router and firewall are used at LSU). The write-up we received this afternoon about their problem is as follows: > Technical Details of Issue: > All traffic at LSU is subject to an IDP to block p2p. Since January > 18th, the IDP was turned off because it was causing high CPU > utilization, slowing the entire campus traffic. Unfortunately, there was > a rule that still would point to an inactive IDP. The way this works, > the flows that match this IDP policy will be asked to redirect to the > IDP module. They will go into a queue waiting for IDP inspection. > Because the IDP module is not enabled, they are momentarily stuck there > waiting for timeout. All this waiting will being to clog up the > buffer-queue, which eventually triggers the log message we saw on the > SRX (i.e. Feb 22 14:03:24 csc-118-l7-srx5800-edgefrwl fpc1 Cobar: XLR1 > flow_held_mbuf 500, raise above 500, 1000th time. ). When the queue gets > full, packets were dropped. Most connections will not see a problem > because TCP will recover and start a new connection. For other > applications, this may look like a DoS due to the constant creation of > new connections, e.g. Unidata. > > LSU future plans: > * Feature request to Juniper: if IDP is turned off, send a warning > message or error trigger that a rule is still pointing to the IDP even > though it is off. Questions: - the first question I have for you is if there has been any recent modifications to firewalls or any "packet shaping" systems at LL or MIT? - the second is if you are using Juniper network equipment? We would like to propose that you change your ~ldm/etc/ldmd.conf REQUEST for NEXRAD2 data to point at the specific idd.unidata.ucar.edu cluster node where the problem was experienced today, uni19.unidata.ucar.edu. NB: It only make sense to make this change if/when repeated connection attempts get denied by idd.unidata.ucar.edu. The reason for this is as follows: - since the outage you experienced today was transitory (i.e., data began flowing again at 21:30:05Z with no change here at the UPC or by you with your LDM configuration), it may be difficult if not impossible to determine the cause of the problem unless the problem occurs again For instance, if there was some maintenance being performed on a router or "packet shaper" in either LL or MIT and the existing LDM connection was through that router or "packet shaper", then the problem may not return since the work is now complete. The situation at LSU was much easier (but still very hard) to diagnose since we could make the connection fail any time we (meaning the LSU folks or UPC staff since we were granted login capability to the LSU LDM machines) chose. re: > > We just experienced a full outage of all our NEXRAD Level II data that we > > pull > > from Unidata via LDM. We're now trying to determine whether the problem was > > at > > our end or the Unidata end. Even if there was a problem at LL/MIT, our upgrading our cluster nodes to a new version of the LDM where duplicate REQUESTs are rejected would magnify the effects of a problem at LL/MIT. With the previous versions of the LDM that were running on idd.unidata.ucar.edu cluster nodes, duplicate REQUEST were allowed, so a transient situation like yours may never have been noticed by you or us as long as the number of new connections did not cause the total number of connections to exceed the maximum we impose on each cluster node (256). Question: - are you OK with leaving the LDM REQUEST on llwxldm1 as is, and only change the REQUEST if you experience the service denial again? Cheers, Tom -- **************************************************************************** Unidata User Support UCAR Unidata Program (303) 497-8642 P.O. Box 3000 address@hidden Boulder, CO 80307 ---------------------------------------------------------------------------- Unidata HomePage http://www.unidata.ucar.edu **************************************************************************** Ticket Details =================== Ticket ID: JLJ-308670 Department: Support LDM Priority: Normal Status: Closed