[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20210830: Re: High CONDUIT latencies from vm-lnx-conduit2.ncep.noaa.gov



Hi Pete and Kevin,

I am not CCing the NCEP folks on this note...

When you say "are still losing data" what exactly do you mean:

- you are not receiving data that should be in the feed?

- something else like the residency time of data in your LDM queue(s)
  is not large enough that it is not being processed out of the
  queue (e.g., FILEd, distributed to downstreams, etc.)?

One of the reasons I am asking is we have had support inquiries from
two sites that were not receiving all of the CONDUIT data that they
believed that they should be receiving.  In on case, the problem was
tracked down to a network problem outside of the department in question,
and in the other case, the problem appears to be partly related to the
machine doing the ingest.

Given the inquiries that we have received, I need to know exactly what
your comment means.

Thanks in advance...

Cheers,

Tom

On 8/30/21 10:58 AM, Pete Pokrandt wrote:
Monday morning update - the large lags from vm-lnx-conduit2 are still there, and we are still losing data..

FYI
Pete



<http://www.weather.com/tv/shows/wx-geeks/video/the-incredible-shrinking-cold-pool>-----
Pete Pokrandt - Systems Programmer
UW-Madison Dept of Atmospheric and Oceanic Sciences
608-262-3086  - address@hidden


------------------------------------------------------------------------
*From:* Anne Myckow - NOAA Federal <address@hidden>
*Sent:* Friday, August 27, 2021 9:46 AM
*To:* Pete Pokrandt <address@hidden>
*Cc:* Tyle, Kevin R <address@hidden>; address@hidden <address@hidden>; address@hidden <address@hidden>; address@hidden <address@hidden>
*Subject:* Re: High CONDUIT latencies from vm-lnx-conduit2.ncep.noaa.gov
Thanks Pete. We are engaging our networking folks on this issue now.

However, be warned that we are having a major internet outage at our Boulder data center. We are moving more apps over to College Park, so you will most likely see latency on both conduit systems today, until at least 22Z. Hopefully our networking folks can actually find a bottleneck this time around but just wanted to let you know. Will keep you posted.

Thanks,
Anne

On Thu, Aug 26, 2021 at 5:30 PM Pete Pokrandt <address@hidden <mailto:address@hidden>> wrote:

    Anne,

    It's hard to say. To my eyes, it looks like the latency problem was
    not solved by a reboot or by moving the server to a different part
    of your infrastructure.. The graphs show that there are still large
    latencies from vm-lnx-conduit2, but maybe not quite as bad as
    before? I did still lose some of the 00 UTC 26 GFS run.. So the
    problem definitely is not resolved.

    Unidata folks, any ideas on things they can try to figure out what's
    going on here? Is their internal network just saturated to the point
    where it can't keep up? Or something about the vm itself that might
    cause that?

    Pete







    
<http://www.weather.com/tv/shows/wx-geeks/video/the-incredible-shrinking-cold-pool>-----
    Pete Pokrandt - Systems Programmer
    UW-Madison Dept of Atmospheric and Oceanic Sciences
    608-262-3086  - address@hidden <mailto:address@hidden>


    ------------------------------------------------------------------------
    *From:* Anne Myckow - NOAA Federal <address@hidden
    <mailto:address@hidden>>
    *Sent:* Wednesday, August 25, 2021 2:46 PM
    *To:* Tyle, Kevin R <address@hidden <mailto:address@hidden>>
    *Cc:* Pete Pokrandt <address@hidden
    <mailto:address@hidden>>; address@hidden
    <mailto:address@hidden> <address@hidden
    <mailto:address@hidden>>; address@hidden
    <mailto:address@hidden>
    <address@hidden
    <mailto:address@hidden>>;
    address@hidden
    <mailto:address@hidden>
    <address@hidden
    <mailto:address@hidden>>
    *Subject:* Re: High CONDUIT latencies from
    vm-lnx-conduit2.ncep.noaa.gov <http://vm-lnx-conduit2.ncep.noaa.gov>
    We have moved vm-lnx-conduit2 to a less busy area within our
    infrastructure. Is the feed from condui1 still good? And please let
    us know what conduit2 looks like.

    Thanks,
    Anne

    On Wed, Aug 25, 2021 at 9:18 AM Anne Myckow - NOAA Federal
    <address@hidden <mailto:address@hidden>> wrote:

        Also, I'd like to know if there are any of you all that are
        *not* experiencing latency. Please let me know if you are in
        that camp.

        Thanks so much,
        Anne

        On Wed, Aug 25, 2021 at 9:04 AM Anne Myckow - NOAA Federal
        <address@hidden <mailto:address@hidden>> wrote:

            Morning,

            I don't see the crazy latency from that one cycle yesterday
            but it still looks pretty bad to me - do you concur?

            Thanks,
            Anne

            On Tue, Aug 24, 2021 at 4:03 PM Anne Myckow - NOAA Federal
            <address@hidden <mailto:address@hidden>> wrote:

                Hi everyone,

                We've tried rebooting the systems, I checked your graph
                and it looks like we won't know for a few cycles if it's
                better - can you let us know if you see something before
                we check it tomorrow morning?

                Thanks,
                Anne

                On Tue, Aug 24, 2021 at 1:59 PM Tyle, Kevin R
                <address@hidden <mailto:address@hidden>> wrote:

                    Hi all,____

                    __ __

                    I can state that our GFS grib file reception via LDM
                    has been extremely spotty, particularly for the
                    F48-F192 forecast hour periods, for several weeks
                    now. We feed from Pete’s LDM at UW-MSN so this is
                    consistent with what Pete has been seeing.____

                    __ __

                    It would be really nice if NCEP’s CONDUIT feed can
                    return to the level of consistent service that we in
                    the community had been accustomed to for many years.____

                    __ __

                    Cheers,____

                    __ __

                    Kevin____

                    __ __

                    _____________________________________________________

                    __ __

                    Kevin Tyle, M.S.; Manager of Departmental Computing____

                    NSF XSEDE Campus Champion
                    Dept. of Atmospheric & Environmental Sciences
                    UAlbany ETEC Bldg – Harriman Campus
                    1220 Washington Avenue, Room 419
                    Albany, NY 12222
                    address@hidden <mailto:address@hidden> |
                    518-442-4578 | @nywxguy | he/him/his ____

                    _____________________________________________________

                    __ __

                    *From:* conduit <address@hidden
                    <mailto:address@hidden>> *On
                    Behalf Of *Pete Pokrandt via conduit
                    *Sent:* Tuesday, August 24, 2021 1:26 PM
                    *To:* Anne Myckow - NOAA Federal
                    <address@hidden <mailto:address@hidden>>
                    *Cc:* address@hidden
                    <mailto:address@hidden>;
                    address@hidden
                    <mailto:address@hidden>;
                    address@hidden
                    <mailto:address@hidden>;
                    address@hidden
                    <mailto:address@hidden>
                    *Subject:* Re: [conduit] High CONDUIT latencies from
                    vm-lnx-conduit2.ncep.noaa.gov
                    <http://vm-lnx-conduit2.ncep.noaa.gov>____

                    __ __

                    Dear Anne and all,____

                    __ __

                    Just a note to let you know we are still
                    experiencing the high latencies. In fact, today the
                    latencies from both vm-lnx-conduit1 and
                    vm-lnx-conduit2 are high.____

                    __ __

                    Pete____

                    __ __

                    
https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+idd-agg.aos.wisc.edu
                    
<https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+idd-agg.aos.wisc.edu>____

                    __ __

                    ____

                    __ __

                    __ __

                    
https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+conduit.unidata.ucar.edu
                    
<https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+conduit.unidata.ucar.edu>____

                    ____

                    __ __

                    -----
                    Pete Pokrandt - Systems Programmer
                    UW-Madison Dept of Atmospheric and Oceanic Sciences
                    608-262-3086  - address@hidden
                    <mailto:address@hidden>____

                    __ __

                    
------------------------------------------------------------------------

                    *From:*Anne Myckow - NOAA Federal
                    <address@hidden <mailto:address@hidden>>
                    *Sent:* Friday, August 20, 2021 12:14 PM
                    *To:* Pete Pokrandt <address@hidden
                    <mailto:address@hidden>>
                    *Cc:* address@hidden
                    <mailto:address@hidden>
                    <address@hidden
                    <mailto:address@hidden>>;
                    address@hidden
                    <mailto:address@hidden>
                    <address@hidden
                    <mailto:address@hidden>>;
                    address@hidden
                    <mailto:address@hidden>
                    <address@hidden
                    <mailto:address@hidden>>;
                    address@hidden
                    <mailto:address@hidden>
                    <address@hidden
                    <mailto:address@hidden>>
                    *Subject:* Re: High CONDUIT latencies from
                    vm-lnx-conduit2.ncep.noaa.gov
                    <http://vm-lnx-conduit2.ncep.noaa.gov> ____

                    ____

                    Pete, ____

                    __ __

                    conduit.ncep.noaa.gov <http://conduit.ncep.noaa.gov>
                    is a load-balanced DNS that points to both conduit1
                    and conduit2 servers on the backend. I'm going to
                    see if we can push you all off of conduit2 for now,
                    hopefully those of you connected to conduit2 will
                    see a brief interruption and then connect to
                    conduit1 automatically.____

                    __ __

                    More to come.____

                    Anne____

                    __ __

                    On Fri, Aug 20, 2021 at 1:12 PM Pete Pokrandt
                    <address@hidden <mailto:address@hidden>>
                    wrote:____

                        It looks like conduit.ncep.noaa.gov
                        <http://conduit.ncep.noaa.gov> is pulling data
                        from both vm-lnx-conduit1 and vm-linux-conduit2
                        - conduit1 seems ok, it's just conduit2 that is
                        showing the large lags.____

                        __ __

                        I don't know how things are set up exactly, but
                        it might work to have conduit.ncep.noaa.gov
                        <http://conduit.ncep.noaa.gov> only request
                        CONDUIT data from vm-lnx-conduit1 until the
                        problem with feeding from conduit2 is resolved? ____


                        Unidata folks, any suggestions from your end?

                        Thanks, we do appreciate all your work on our
                        behalf!____

                        Pete____

                        __ __

                        __ __

                        -----
                        Pete Pokrandt - Systems Programmer
                        UW-Madison Dept of Atmospheric and Oceanic Sciences
                        608-262-3086  - address@hidden
                        <mailto:address@hidden>____

                        __ __

                        
------------------------------------------------------------------------

                        *From:*Anne Myckow - NOAA Federal
                        <address@hidden <mailto:address@hidden>>
                        *Sent:* Friday, August 20, 2021 12:07 PM
                        *To:* Pete Pokrandt <address@hidden
                        <mailto:address@hidden>>
                        *Cc:* address@hidden
                        <mailto:address@hidden>
                        <address@hidden
                        <mailto:address@hidden>>;
                        address@hidden
                        <mailto:address@hidden>
                        <address@hidden
                        <mailto:address@hidden>>;
                        address@hidden
                        <mailto:address@hidden>
                        <address@hidden
                        <mailto:address@hidden>>;
                        address@hidden
                        <mailto:address@hidden>
                        <address@hidden
                        <mailto:address@hidden>>
                        *Subject:* Re: High CONDUIT latencies from
                        vm-lnx-conduit2.ncep.noaa.gov
                        <http://vm-lnx-conduit2.ncep.noaa.gov> ____

                        ____

                        Hi Pete, ____

                        __ __

                        We have a lot of systems and applications
                        running out of College Park right now, which I
                        think is part of it. But I will have someone
                        take a look at conduit2 today, see if maybe we
                        need to try and move your connections to
                        conduit1 instead.____

                        __ __

                        Thanks,____

                        Anne____

                        __ __

                        On Fri, Aug 20, 2021 at 12:54 PM Pete Pokrandt
                        <address@hidden <mailto:address@hidden>>
                        wrote:____

                            Dear Anne, Dustin and all,____

                            __ __

                            Did you see this? We are still experiencing
                            high latencies of 800-1000 seconds on our
                            CONDUIT feeds during the times when the GFS
                            comes through that appear to be coming from
                            the host____

                            __ __

                            vm-lnx-conduit2.ncep.noaa.gov
                            <http://vm-lnx-conduit2.ncep.noaa.gov>____

                            __ __

                            Here are the most recent lags. Any ideas?

                            Thanks,____

                            Pete____

                            __ __

                            
https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+idd-agg.aos.wisc.edu
                            
<https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+idd-agg.aos.wisc.edu>____

                            __ __

                            ____

                            __ __

                            __ __

                            
https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+conduit.unidata.ucar.edu
                            
<https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+conduit.unidata.ucar.edu>____

                            ____

                            __ __

                            __ __

                            -----
                            Pete Pokrandt - Systems Programmer
                            UW-Madison Dept of Atmospheric and Oceanic
                            Sciences
                            608-262-3086  - address@hidden
                            <mailto:address@hidden>____

                            __ __

                            
------------------------------------------------------------------------

                            *From:*Pete Pokrandt
                            *Sent:* Wednesday, August 18, 2021 3:02 PM
                            *To:* address@hidden
                            <mailto:address@hidden>
                            <address@hidden
                            <mailto:address@hidden>>;
                            address@hidden
                            <mailto:address@hidden>
                            <address@hidden
                            <mailto:address@hidden>>;
                            address@hidden
                            <mailto:address@hidden>
                            <address@hidden
                            <mailto:address@hidden>>
                            *Cc:* address@hidden
                            <mailto:address@hidden>
                            <address@hidden
                            <mailto:address@hidden>>;
                            address@hidden
                            <mailto:address@hidden>
                            <address@hidden
                            <mailto:address@hidden>>
                            *Subject:* High CONDUIT latencies from
                            vm-lnx-conduit2.ncep.noaa.gov
                            <http://vm-lnx-conduit2.ncep.noaa.gov> ____

                            ____

                            Dear Anne, Dustin and all,____

                            __ __

                            Recently we have noticed fairly high
                            latencies on the CONDUIT ldm data feed
                            originating from the machine
                            vm-lnx-conduit2.ncep.noaa.gov
                            <http://vm-lnx-conduit2.ncep.noaa.gov>. The
                            feed originating from
                            vm-lnx-conduit1.ncep.noaa.gov
                            <http://vm-lnx-conduit1.ncep.noaa.gov> does
                            not have the high latencies. Unidata and
                            other top level feeds are seeing similar
                            high latencies from
                            vm-lnx-conduit2.ncep.noaa.gov
                            <http://vm-lnx-conduit2.ncep.noaa.gov>.____

                            __ __

                            Here are some graphs showing the latencies
                            that I'm seeing:____

                            __ __

                             From
                            
https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+idd-agg.aos.wisc.edu
                            
<https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+idd-agg.aos.wisc.edu> -
                            latencies for CONDUIT data arriving at our
                            UW-Madison AOS ingest machine____

                            __ __

                            ____

                            __ __

                             From
                            
https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/siteindex?conduit.unidata.ucar.edu
                            
<https://rtstats.unidata.ucar.edu/cgi-bin/rtstats/siteindex?conduit.unidata.ucar.edu> (latencies
                            at Unidata)____

                            __ __

                            ____

                            __ __

                            At least here at UW-Madison, these latencies
                            are causing us to lose some data during the
                            large GFS/GEFS periods.____

                            __ __

                            Any idea what might be causing this?____

                            __ __

                            Pete____

                            __ __

                            __ __

                            __ __

                            __ __

                            -----
                            Pete Pokrandt - Systems Programmer
                            UW-Madison Dept of Atmospheric and Oceanic
                            Sciences
                            608-262-3086  - address@hidden
                            <mailto:address@hidden>____


                        ____

                        __ __

                        -- ____

                        Anne Myckow____

                        Dataflow Team Lead____

                        NWS/NCEP/NCO____


                    ____

                    __ __

                    -- ____

                    Anne Myckow____

                    Dataflow Team Lead____

                    NWS/NCEP/NCO____



-- Anne Myckow
                Dataflow Team Lead
                NWS/NCEP/NCO



-- Anne Myckow
            Dataflow Team Lead
            NWS/NCEP/NCO



-- Anne Myckow
        Dataflow Team Lead
        NWS/NCEP/NCO



-- Anne Myckow
    Dataflow Team Lead
    NWS/NCEP/NCO



--
Anne Myckow
Dataflow Team Lead
NWS/NCEP/NCO

--
+----------------------------------------------------------------------+
* Tom Yoksas                                      UCAR Unidata Program *
* (303) 497-8642 (last resort)                           P.O. Box 3000 *
* address@hidden                                    Boulder, CO 80307 *
* Unidata WWW Service                     http://www.unidata.ucar.edu/ *
+----------------------------------------------------------------------+