[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20010816: ldm outage
- Subject: 20010816: ldm outage
- Date: Thu, 16 Aug 2001 16:47:33 -0600
Mike,
Your notifyme command should use -f WMO instead of -p WMO. That is why
you wont see anything with your pattern.
Also, if you use the -o option, you will see that UCLA is currently
receiving data with roughly 65 minute latency:
ldm/bin/notifyme -vl - -h typhoon.atmos.ucla.edu -f WMO -o 4000
Aug 16 22:37:42 notifyme[3494]: Starting Up: typhoon.atmos.ucla.edu:
20010816213102.754 TS_ENDT {{WMO, ".*"}}
Aug 16 22:37:42 notifyme[3494]: NOTIFYME(typhoon.atmos.ucla.edu): OK
Aug 16 22:37:43 notifyme[3494]: 591 20010816213131.488 IDS|DDPLUS 531
NZUS23 KBYZ 162130 /pWRKCCF
Aug 16 22:37:43 notifyme[3494]: 150 20010816213131.489 IDS|DDPLUS 532
FPUS45 KBYZ 162130 /pCCFBYZ
Aug 16 22:37:43 notifyme[3494]: 1641 20010816213131.493 IDS|DDPLUS 533
SDUS86 KSTO 162121 /pDPADAX
Aug 16 22:37:43 notifyme[3494]: 3965 20010816213133.691 HDS 678 SDUS83
KOAX 162122 /pDPAOAX
Aug 16 22:37:44 notifyme[3494]: 748 20010816213136.675 IDS|DDPLUS 682
NZUS90 KGRB 162131 /pBRNSUE
Aug 16 22:37:44 notifyme[3494]: 5759 20010816213136.686 HDS 683 SDUS82
KTBW 162128 /pDPATBW
Aug 16 22:37:44 notifyme[3494]: 4819 20010816213136.790 HDS 688 SDUS85
KVEF 162128 /pDPAESX
Aug 16 22:37:44 notifyme[3494]: 529 20010816213236.688 IDS|DDPLUS 791
SXUS70 KWAL 162130
etc...
UCLA is receiving data, but your LDM will RECLASS to 60 minutes whenever it
receives
a product more than 1.0833 hour old in an attempt to catch up. UCLA's should be
doing the
same with its upstream feed. So, the data that is reaching UCLA is from
washington is
just under the wire.
The fact that your notifyme gets through to UCLA and you are connected shows
that there is
no portmapper problem.
The problem which bears investigation is why UCLA is struggling to keep up.
A recheck shows that the data currently arriving is the 18Z AVN and is still
65 minutes behind. The 18Z AVN is a new addition to the NOAAPORT stream since
July 30,
so the added data load may be more than UCLA's connection to Washington can
handle, OR.... it could be that when UCLA failed ove, they were already behind
1 hour,
so now they are bumping along to their backup and the large amount of model data
at that time prevents them from catching up.
Steve Chiswell
Unidata User Support
>From: Mike Voss <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200108162211.f7GMB7102462
>Hello,
>
>My ldm (rossby.met.sjsu.edu) stopped receiving data about 3 hours ago from my
> upstream site (typhoon.atmos.ucla.edu). I have spoken with James Murikami the
> site contact there and between the two of us we can't figure out what's goin
> g on. He is getting data from his failover (washington), but it's not coming
> throug to me. I can ldmping fine:
>----
>rossby:~/data>ldmping typhoon.atmos.ucla.edu
>Aug 16 21:53:54 State Elapsed Port Remote_Host rpc_stat
>Aug 16 21:53:54 RESPONDING 0.063283 388 typhoon.atmos.ucla.edu
>Aug 16 21:54:19 RESPONDING 0.026459 388 typhoon.atmos.ucla.edu
>---
>I get "OK "with notifyme, but no data:
>---
>rossby:~/data>notifyme -vl - -h typhoon.atmos.ucla.edu -p WMO
>Aug 16 21:55:30 notifyme[22279]: Starting Up: typhoon.atmos.ucla.edu: 20010816
> 215530.753 TS_ENDT {{A
>NY, "WMO"}}
>Aug 16 21:55:30 notifyme[22279]: NOTIFYME(typhoon.atmos.ucla.edu): reclass: 20
> 010816215530.753 TS_EN
>DT {{NNEXRAD|UNIDATA, "WMO"}}
>Aug 16 21:55:30 notifyme[22279]: NOTIFYME(typhoon.atmos.ucla.edu): OK
>---
>I checked logs...
>-----
>Aug 16 21:46:25 rossby typhoon[22093]: run_requester: 20010816204625.945 TS_EN
> DT {{HDS|DDPLUS, ".*"
>}}
>Aug 16 21:46:25 rossby typhoon[22091]: run_requester: 20010816204625.965 TS_EN
> DT {{NNEXRAD, "^SDUS5
>."},{MCIDAS, "^pnga2area Q[01]"}
>Aug 16 21:46:26 rossby typhoon[22093]: FEEDME(typhoon.atmos.ucla.edu.): OK
>Aug 16 21:46:26 rossby typhoon[22091]: FEEDME(typhoon.atmos.ucla.edu): OK
>Aug 16 21:49:42 rossby pqexpire[22081]: > Recycled 410.537 kb/hr ( 14.97
> 4 prods per hour)
>Aug 16 21:51:32 rossby typhoon[22093]: RECLASS: 20010816205132.850 TS_ENDT {{H
> DS|DDPLUS, ".*"}}
>Aug 16 21:51:32 rossby typhoon[22093]: skipped: 20010816204722.214 (250.636 se
> conds)
>Aug 16 21:54:42 rossby pqexpire[22081]: > Recycled 371.707 kb/hr ( 13.33
> 0 prods per hour)
>Aug 16 21:56:12 rossby typhoon[22093]: RECLASS: 20010816205612.085 TS_ENDT {{H
> DS|DDPLUS, ".*"}}
>Aug 16 21:56:12 rossby typhoon[22093]: skipped: 20010816205133.765 (278.320 se
> conds)
>------------
>
>...and here is some info from UCLA:
>
>------------
>Mike,
>If you're going to mail Unidata support, here's some
>info from TYPHOON:
>The ldmd.log file appeared this way after I restarted
>the ldmadmin--
>Aug 16 21:41:32 typhoon rpc.ldmd[12880]: Starting Up (built: Aug 24 2000 16:14
> :30)
>Aug 16 21:41:33 typhoon sunny89[12883]: run_requester: Starting Up: sunny89.at
> mos.washington.edu
>Aug 16 21:41:33 typhoon pqact[12882]: Starting Up
>Aug 16 21:41:33 typhoon striker[12884]: run_requester: Starting Up: striker.at
> mos.albany.edu
>Aug 16 21:41:33 typhoon pqbinstats[12881]: Starting Up (12880)
>Aug 16 21:41:33 typhoon striker[12884]: run_requester: 20010816214237.424 TS_E
> NDT {{NLDN, ".*"}}
>Aug 16 21:41:34 typhoon striker[12884]: FEEDME(striker.atmos.albany.edu): OK
>Aug 16 21:41:35 typhoon localhost[12892]: Connection from localhost
>Aug 16 21:41:35 typhoon localhost[12892]: Connection reset by peer
>Aug 16 21:41:35 typhoon localhost[12892]: Exiting
>Aug 16 21:41:38 typhoon sunny89[12883]: run_requester: 20010816204133.025 TS_E
> NDT {{NMC|FSL2|UNIDATA, ".*"},{DIFAX, ".*"},{NNEXRAD,
>Aug 16 21:41:38 typhoon sunny89[12883]: FEEDME(sunny89.atmos.washington.edu):
> reclass: 20010816204133.025 TS_ENDT {{NMC3|FSL2|UNIDATA,
>Aug 16 21:41:38 typhoon sunny89[12883]: FEEDME(sunny89.atmos.washington.edu):
> OK
>Aug 16 21:41:38 typhoon sunny89[12883]: RECLASS: 20010816204138.329 TS_ENDT {{
> NMC3|FSL2|UNIDATA, ".*"},{DIFAX, ".*"},{NNEXRAD, "^SD
>Aug 16 21:41:38 typhoon sunny89[12883]: skipped: 20010816204133.095 (5.234 sec
> onds)
>Aug 16 21:41:39 typhoon sunny89[12883]: RECLASS: 20010816204139.909 TS_ENDT {{
> NMC3|FSL2|UNIDATA, ".*"},{DIFAX, ".*"},{NNEXRAD, "^SD
>Aug 16 21:41:40 typhoon sunny89[12883]: RECLASS: 20010816204140.124 TS_ENDT {{
> NMC3|FSL2|UNIDATA, ".*"},{DIFAX, ".*"},{NNEXRAD, "^SD
>Aug 16 21:41:40 typhoon sunny89[12883]: RECLASS: 20010816204140.383 TS_ENDT {{
> NMC3|FSL2|UNIDATA, ".*"},{DIFAX, ".*"},{NNEXRAD, "^SD
>Aug 16 21:41:40 typhoon sunny89[12883]: RECLASS: 20010816204140.743 TS_ENDT {{
> NMC3|FSL2|UNIDATA, ".*"},{DIFAX, ".*"},{NNEXRAD, "^SD
>Aug 16 21:41:42 typhoon rossby[12896]: Connection from rossby.met.sjsu.edu
>Aug 16 21:41:42 typhoon rossby(noti)[12896]: Starting Up: 20010816214526.450 T
> S_ENDT {{NNEXRAD|UNIDATA, "HDS"}}
>Aug 16 21:41:42 typhoon rossby(noti)[12896]: topo: rossby.met.sjsu.edu NNEXRAD
> |UNIDATA
>Aug 16 21:41:48 typhoon rossby[12897]: Connection from rossby.met.sjsu.edu
>Aug 16 21:41:48 typhoon rossby(feed)[12897]: Starting Up: 20010816204625.945 T
> S_ENDT {{HDS|DDPLUS, ".*"}}
>Aug 16 21:41:48 typhoon rossby(feed)[12897]: topo: rossby.met.sjsu.edu HDS|DDP
> LUS
>Aug 16 21:41:48 typhoon rossby[12898]: Connection from rossby.met.sjsu.edu
>Aug 16 21:41:48 typhoon rossby(feed)[12898]: Starting Up: 20010816204625.965 T
> S_ENDT {{NNEXRAD, "^SDUS5."},{MCIDAS, "^pnga2area Q[01
>Aug 16 21:41:48 typhoon rossby(feed)[12898]: topo: rossby.met.sjsu.edu NNEXRAD
> |MCIDAS
>But later....
>Aug 16 21:52:14 typhoon sunny89[12883]: Disconnect
>Aug 16 21:52:20 typhoon rossby(noti)[13387]: nullproc(rossby.met.sjsu.edu): RP
> C: Unable to receive
>Aug 16 21:52:20 typhoon rossby(noti)[13387]: Exiting
>
>The ldmbinstats.upc file showed--
>TOPOLOGY
>typhoon.atmos.ucla.edu rossby.met.sjsu.edu NNEXRAD|MCIDAS
>unknown unknown
>typhoon.atmos.ucla.edu inisas02.inis.iarc.uaf.edu UNIDATA
>unknown unknown
>typhoon.atmos.ucla.edu rossby.met.sjsu.edu HDS|DDPLUS
>unknown unknown
>typhoon.atmos.ucla.edu atm23.ucdavis.edu DDPLUS
>unknown unknown
>TOPOEND
>
>I don't know if this helps.
>James
>-------------------------------------
>
>My data disk is not full. I've stopped and started a few times, I've made sure
> all the processes were dead, I've deleted and remade the que. I'm not sure w
> hat else to check at this point. UCLA says nothing looks strange on their end
> . I'm begining to suspect someone made changes to our firewall or something,
> I'll check that out
>Any ideas? thanks!
>
>Mike
>
>
>
>
>
>
>
>--------------------------
>Mike Voss
>Department of Meteorology
>San Jose State University
>One Washington Square
>San Jose, CA 95192-0104
>
>408.924.5204 voice
>408.924.5191 fax
>
>From address@hidden Thu Aug 16 16:17:38 2001
>Received: from helios.sjsu.edu (helios.sjsu.edu [130.65.3.25])
> by unidata.ucar.edu (UCAR/Unidata) with ESMTP id f7GMHb102592
> for <address@hidden>; Thu, 16 Aug 2001 16:17:37 -0600 (MDT)
>Organization: UCAR/Unidata
>Keywords: 200108162217.f7GMHb102592
>Received: from PILEUS.metsun1.met.sjsu.edu (pileus.met.sjsu.edu [130.65.97.142
> ])
> by helios.sjsu.edu (8.10.2+Sun/8.10.2) with ESMTP id f7GMHaV25424;
> Thu, 16 Aug 2001 15:17:36 -0700 (PDT)
>Message-Id: <address@hidden>
>X-Sender: address@hidden
>X-Mailer: QUALCOMM Windows Eudora Version 5.1
>Date: Thu, 16 Aug 2001 15:19:55 -0700
>To: address@hidden
>From: Mike Voss <address@hidden>
>Subject: ldm outage (more)
>Cc: address@hidden
>Mime-Version: 1.0
>Content-Type: text/plain; charset="us-ascii"
>
>additional info:
>
>I am receiving NLDN data from SUNY.
>
>Idea: Maybe the failover LDM at UCLA is not handling the portmapping correctly
> for some reason. My machine rossby.met.sjsu.edu is only accessible via port
> 388 through our campus firewall.
>
>Mike
>
>
>
>--------------------------
>Mike Voss
>Department of Meteorology
>San Jose State University
>One Washington Square
>San Jose, CA 95192-0104
>
>408.924.5204 voice
>408.924.5191 fax
>