This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Justin, is all the data feeds getting into LDM2 and LDM1 ? Also, is there a way we can test the connection to the LDM system via over I1 rather then I2? Douglas Schuster wrote: > Hi Justin, > > The latencies are still looking bad using ldm2. This has lead to the > continuation of large numbers > of missing fields (identical on both receiving machines, NCAR and > Unidata) from all model cycles over the weekend, > continuing this morning. > > http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+idd.unidata.ucar.edu+LOG > > > Doug > > On Jun 18, 2007, at 6:32 AM, Justin Cooke wrote: > >> You're right Chi, I misread the graph, we are on ldm2 and have been >> since Saturday. My apologies. >> >> Justin >> >> Chi Y Kang wrote: >>> Justin Cooke wrote: >>>> Chi, >>>> >>>> It looks like you switched us back to ldm1 on Saturday but >>>> according to the graphs from Steve they experienced the same delays. >>> >>> Running on ldm2 right now. It looks like the send-Q on our end seems >>> to be okay. All of these connections are going via I2, let me see >>> what rate limit there is set on I2 connection coming out of the campus. >>> >>> >>>> >>>> Justin >>>> >>>> Steve Chiswell wrote: >>>>> Chi & Justin, >>>>> >>>>> The latency of data today has been high like yesterday, even with >>>>> the switch of >>>>> ldm2. The throughput looks restricted either by a router or >>>>> firewall/packet >>>>> shaping, but was wondering if coincident with Justin's restart was >>>>> that the >>>>> connections had to be re-established, so changes took effect at >>>>> that time. >>>>> >>>>> Thanks for all your efforts, >>>>> >>>>> Steve Chiswell >>>>> Unidata User Support >>>>> >>>>> >>>>> >>>>> On Fri, 15 Jun 2007, Chi Y Kang wrote: >>>>> >>>>> >>>>>> Wait a minute here, >>>>>> >>>>>> 128.117.140.208 isn't in the mix. The other hosts are. >>>>>> >>>>>> I updated the LDM access list. Should we just have some class C >>>>>> ranges >>>>>> to have access rather then ip at a time? >>>>>> >>>>>> Also, i noticed that the send-Q are pretty normal on ldm2 server >>>>>> right >>>>>> now but was pretty high on ldm1. might be just an issue with the ACL >>>>>> list. >>>>>> >>>>>> >>>>>> 128.117.12.2 >>>>>> 128.117.12.3 >>>>>> 128.117.130.220 >>>>>> 128.117.140.208 >>>>>> 128.117.140.220 >>>>>> 128.117.149.220 >>>>>> 128.117.156.220 >>>>>> 128.174.80.16 >>>>>> 128.174.80.47 >>>>>> 140.90.193.19 >>>>>> 140.90.193.227 >>>>>> 140.90.193.228 >>>>>> 140.90.193.99 >>>>>> 140.90.226.201 >>>>>> 140.90.226.202 >>>>>> 140.90.226.203 >>>>>> 140.90.226.204 >>>>>> 140.90.37.12 >>>>>> 140.90.37.13 >>>>>> 140.90.37.15 >>>>>> 140.90.37.16 >>>>>> 140.90.37.40 >>>>>> 144.92.130.88 >>>>>> 144.92.131.244 >>>>>> 150.9.117.128 >>>>>> 192.12.209.57 >>>>>> 192.58.3.194 >>>>>> 192.58.3.195 >>>>>> 192.58.3.196 >>>>>> 192.58.3.197 >>>>>> 193.61.196.74 >>>>>> 198.181.231.53 >>>>>> 208.64.117.128 >>>>>> >>>>>> >>>>>> Justin Cooke wrote: >>>>>> >>>>>>> Chi, >>>>>>> >>>>>>> The reboot doesn't seem to have helped. Is there anything else >>>>>>> that may >>>>>>> be causing these issues? Network related after I performed the >>>>>>> restart >>>>>>> of LDM? Steve has a few possibilities: >>>>>>> >>>>>>> /It seems to be network related at your end, but strange that it >>>>>>> occurred at the time when you retsrtaed the LDM- unless there >>>>>>> was some >>>>>>> sort of firewall or packet filter that occurred when the LDM's >>>>>>> re-connected. / >>>>>>> >>>>>>> Justin >>>>>>> >>>>>>> Steve Chiswell wrote: >>>>>>> >>>>>>>> Justin, >>>>>>>> >>>>>>>> I haven't seen any improvement from ncepldm to the top level >>>>>>>> relays >>>>>>>> daffy.unidata.ucar.edu (Unidata), idd.aos.wisc.edu (U. WIsconsin), >>>>>>>> flood.atmos.uiuc.edu (U. Illinois) or atm.cise-nsf.gov (NSF, DC). >>>>>>>> >>>>>>>> It seems to be network related at your end, but strange that it >>>>>>>> occurred >>>>>>>> at the time when you retsrtaed the LDM- unless there was some >>>>>>>> sort of >>>>>>>> firewall or packet filter that occurred when the LDM's >>>>>>>> re-connected. >>>>>>>> >>>>>>>> Thanks for your time in looking at this, >>>>>>>> >>>>>>>> Steve >>>>>>>> >>>>>>>> >>>>>>>> On Fri, 2007-06-15 at 15:31 -0400, Justin Cooke wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Steve and Doug, >>>>>>>>> >>>>>>>>> I just got a call from Chi at the WOC, he rebooted LDM1 after >>>>>>>>> noticing >>>>>>>>> an unusual load on the machine. LDM is again running on that >>>>>>>>> box and it >>>>>>>>> remains primary, can you check to see how the latencies are now? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Justin >>>>>>>>> >>>>>>>>> Doug Schuster wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Justin, >>>>>>>>>> >>>>>>>>>> 28,079 products are missing from the 12z cycle. You'll be >>>>>>>>>> getting the >>>>>>>>>> automated email shortly. >>>>>>>>>> >>>>>>>>>> -Doug >>>>>>>>>> >>>>>>>>>> On Jun 15, 2007, at 12:48 PM, Justin Cooke wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Steve, >>>>>>>>>>> >>>>>>>>>>> I've turned off the feed to LDM2. >>>>>>>>>>> >>>>>>>>>>> There is no other load on the ldm1 system except for LDM. >>>>>>>>>>> >>>>>>>>>>> Doug, are you missing many of the TIGGE params for 12Z? >>>>>>>>>>> >>>>>>>>>>> Justin >>>>>>>>>>> >>>>>>>>>>> Steve Chiswell wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Justin, >>>>>>>>>>>> >>>>>>>>>>>> That didn't change the behavior. Still seeing latency. >>>>>>>>>>>> perhaps turning off the other feed. Is there any load >>>>>>>>>>>> other than LDM on the system? >>>>>>>>>>>> >>>>>>>>>>>> Steve >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, 2007-06-15 at 12:56 -0400, Justin Cooke wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Steve, >>>>>>>>>>>>> >>>>>>>>>>>>> I've recreated the queue, let me know if you are still >>>>>>>>>>>>> seeing issues. >>>>>>>>>>>>> >>>>>>>>>>>>> If so I'll turn off the feed to ldm2 to see if that >>>>>>>>>>>>> corrects things. >>>>>>>>>>>>> >>>>>>>>>>>>> Justin >>>>>>>>>>>>> >>>>>>>>>>>>> Steve Chiswell wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Justin, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't know if they saw a disk space problem with >>>>>>>>>>>>>> log files not being rotated, but it might just be >>>>>>>>>>>>>> best today to build a new queue: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ldmadmin stop >>>>>>>>>>>>>> ldmadmin delqueue >>>>>>>>>>>>>> ldmadmin mkqueue >>>>>>>>>>>>>> ldmadmin start >>>>>>>>>>>>>> >>>>>>>>>>>>>> That will mean some queued data would be lost, but if >>>>>>>>>>>>>> users aren't >>>>>>>>>>>>>> getting it >>>>>>>>>>>>>> anyway, then its best to ensure that the queue isn't >>>>>>>>>>>>>> corrupt for the >>>>>>>>>>>>>> weekend. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Happy Friday.... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Steve >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, 2007-06-15 at 12:13 -0400, Justin Cooke wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Steve, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Our logs on the primary ldm system "ldm1" had not >>>>>>>>>>>>>>> rotated for >>>>>>>>>>>>>>> nearly a week. I sent email to the WOC support and this >>>>>>>>>>>>>>> was the >>>>>>>>>>>>>>> response: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looks like the seed file was missing after we brought >>>>>>>>>>>>>>> the system >>>>>>>>>>>>>>> backup >>>>>>>>>>>>>>> from the last outage. should be good now. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Justin Cooke wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> WOC, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I noticed that our logs for LDM have not been rotated >>>>>>>>>>>>>>>> on machine >>>>>>>>>>>>>>>> ldm1 >>>>>>>>>>>>>>>> since 06/05/2007. We have a cron entry that runs "ldmadmin >>>>>>>>>>>>>>>> newlog" at >>>>>>>>>>>>>>>> 00Z every day. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I attempted to run the command by hand and got the >>>>>>>>>>>>>>>> following back: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ldm@ldm1:~$ bin/ldmadmin newlog >>>>>>>>>>>>>>>> hupsyslog: couldn't open /var/run/syslogd.pid >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I checked but /var/run/syslogd.pid is not there but it >>>>>>>>>>>>>>>> is on ldm2. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Could there be a problem with syslogd on ldm1? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Justin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also around that time I turned on our backup feed to the >>>>>>>>>>>>>>> ldm2 >>>>>>>>>>>>>>> system which had been off since that system had issues a >>>>>>>>>>>>>>> few >>>>>>>>>>>>>>> weeks ago (we were asked by WOC to turn it back on). I >>>>>>>>>>>>>>> have sent >>>>>>>>>>>>>>> email to their support group asking if both ldm1 and >>>>>>>>>>>>>>> ldm2 are >>>>>>>>>>>>>>> responding to the ncepldm.woc.noaa.gov address or if >>>>>>>>>>>>>>> something >>>>>>>>>>>>>>> else is going on. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Justin >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Steve Chiswell wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Justin, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yesterday just after 18Z, the data flow from >>>>>>>>>>>>>>>> ncepldm.woc.noaa.gov >>>>>>>>>>>>>>>> to top level sites at NSF and Unidata both began >>>>>>>>>>>>>>>> showing high >>>>>>>>>>>>>>>> latency: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +atm.cise-nsf.gov >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +daffy.unidata.ucar.edu >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Data volume out has dropped as a result: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc?CONDUIT >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +atm.cise-nsf.gov >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Since the behavior is similar at both sites at separate >>>>>>>>>>>>>>>> locations, the >>>>>>>>>>>>>>>> problem would appear to be near your end. Since that >>>>>>>>>>>>>>>> coincides >>>>>>>>>>>>>>>> with your >>>>>>>>>>>>>>>> restart of the LDM, could you fill me in on the issues >>>>>>>>>>>>>>>> you were >>>>>>>>>>>>>>>> experiencing? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Steve Chiswell >>>>>>>>>>>>>>>> Unidata User Support >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, 2007-06-15 at 11:38 -0400, Justin Cooke wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Doug, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I had to restart our LDM yesterday right before the >>>>>>>>>>>>>>>>> 18Z cycle, >>>>>>>>>>>>>>>>> we had an issue with out logging but none of the >>>>>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>> files changed. Could one of your feeds have lost the >>>>>>>>>>>>>>>>> connection >>>>>>>>>>>>>>>>> to our LDM during that restart? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Justin >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Douglas Schuster wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yes, we've received partial cycles. More than half of >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> expected fields have been missing >>>>>>>>>>>>>>>>>> in each cycle from June 14 18Z, to June 15, 06Z. The >>>>>>>>>>>>>>>>>> number >>>>>>>>>>>>>>>>>> of missing fields varies between >>>>>>>>>>>>>>>>>> each cycle. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Doug >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Jun 15, 2007, at 9:11 AM, Justin Cooke wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Doug, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Have you received any GEFS data from us today? Or is >>>>>>>>>>>>>>>>>>> it just >>>>>>>>>>>>>>>>>>> certain fields you are missing? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Justin >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> --Chi Y. Kang >>>>>> Contractor >>>>>> Principal Engineer >>>>>> Phone: 301-713-3333 x201 >>>>>> Cell: 240-338-1059 >>>>>> >>>>>> >>> >>> > -- Chi Y. Kang Contractor Principal Engineer Phone: 301-713-3333 x201 Cell: 240-338-1059