This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Tom, > To: <address@hidden> > From: Tom McDermott <address@hidden> > Subject: LDM-6 Possible Problem > Organization: UCAR/Unidata > Keywords: 200309241848.h8OImak1025105 The above message contained the following: > We have been having a number of network problems here at SUNY Brockport > which may have exposed a possible problem with the LDM-6 software. > > Under certain conditions, there may be a loss of communications between > the downstream and upstream servers such that the downstream believes that > the connection between them is good but that the upstream has no data to > send, when in fact the upstream has plenty of data to send, but the > downstream does not recognize this. This situation can persist for > many minutes or even hours. The only remedy for this has been to > restart the ldm server. In other words, the lack of any > data does not cause the connection to time out and be reset. The situation you describe is definitely a problem and I thank you for bringing it to my attention. It is unlikely, however, that the downstream LDM is merely waiting. Code in the downstream LDM causes it make a connection to the top-level LDM on the upstream host and inquire as to whether or not its upstream LDM is still alive. This happens every 60 seconds if the downstream LDM hasn't received a data-product in that time. > I have observed this happening on numerous occasions with all of our feeds > over the past month. Here is an example from yesterday. On the HDS feed, > a product was last received at 1401Z. Then this sequence from the log: > > Sep 23 14:02:18 vortex atm[26947]: ERROR: requester6.c:206: Connection to > upstream LDM closed > Sep 23 14:02:48 vortex atm[26947]: Desired product class: 20030923140025.915 > TS_ENDT {{HDS, "(^[A-OQ-X])|(^[YZ].[HIQR])"}} > Sep 23 14:02:51 vortex atm[26947]: Connected to upstream LDM-6 > Sep 23 14:02:52 vortex atm[26947]: Upstream LDM is willing to feed > Sep 23 14:04:00 vortex atm[26947]: ERROR: requester6.c:206: Connection to > upstream LDM closed > Sep 23 14:04:30 vortex atm[26947]: Desired product class: 20030923140026.038 > TS_ENDT {{HDS, "(^[A-OQ-X])|(^[YZ].[HIQR])"}} > Sep 23 14:04:30 vortex atm[26947]: Connected to upstream LDM-6 > Sep 23 14:04:30 vortex atm[26947]: Upstream LDM is willing to feed > Sep 23 14:06:39 vortex atm[26947]: ERROR: requester6.c:206: Connection to > upstream LDM closed > Sep 23 14:07:09 vortex atm[26947]: Desired product class: 20030923140112.512 > TS_ENDT {{HDS, "(^[A-OQ-X])|(^[YZ].[HIQR])"}} > Sep 23 14:07:09 vortex atm[26947]: Connected to upstream LDM-6 > Sep 23 14:07:09 vortex atm[26947]: Upstream LDM is willing to feed > > After this last message, there were no furthur messages regarding the HDS > feed or HDS products received until 1456 when I recognized the situation > and restarted the server. The corresponding log-entries on the upstream host (atm.geo.nsf.gov) are Sep 23 14:02:28 atm.geo.nsf.gov vortex(feed)[8145]: up6.c:168: HEREIS: RPC: Unable to send; errno = Broken pipe Sep 23 14:02:28 atm.geo.nsf.gov vortex(feed)[8145]: up6.c:396: Product send failure: I/O error Sep 23 14:02:52 atm.geo.nsf.gov vortex(feed)[17715]: up6.c:331: Starting Up(6.0.14/6): 20030923140025.915 TS_ENDT {{HDS, "(^[A-OQ-X])|(^[YZ].[HIQR])"}} Sep 23 14:02:52 atm.geo.nsf.gov vortex(feed)[17715]: topo: vortex.esc.brockport.edu HDS Sep 23 14:04:00 atm.geo.nsf.gov vortex(feed)[17715]: up6.c:168: HEREIS: RPC: Unable to send; errno = Broken pipe Sep 23 14:04:00 atm.geo.nsf.gov vortex(feed)[17715]: up6.c:396: Product send failure: I/O error Sep 23 14:04:30 atm.geo.nsf.gov vortex(feed)[18484]: up6.c:331: Starting Up(6.0.14/6): 20030923140026.038 TS_ENDT {{HDS, "(^[A-OQ-X])|(^[YZ].[HIQR])"}} Sep 23 14:04:30 atm.geo.nsf.gov vortex(feed)[18484]: topo: vortex.esc.brockport.edu HDS Sep 23 14:06:39 atm.geo.nsf.gov vortex(feed)[18484]: up6.c:168: HEREIS: RPC: Unable to send; errno = Broken pipe Sep 23 14:06:39 atm.geo.nsf.gov vortex(feed)[18484]: up6.c:396: Product send failure: I/O error Sep 23 14:07:09 atm.geo.nsf.gov vortex(feed)[20960]: up6.c:331: Starting Up(6.0.14/6): 20030923140112.512 TS_ENDT {{HDS, "(^[A-OQ-X])|(^[YZ].[HIQR])"}} Sep 23 14:07:09 atm.geo.nsf.gov vortex(feed)[20960]: topo: vortex.esc.brockport.edu HDS Sep 23 14:56:49 atm.geo.nsf.gov vortex(feed)[20960]: up6.c:288: nullproc_6() failure to vortex.esc.brockport.edu: RPC: Unable to receive; errno = Connection reset by peer Other than the "Broken pipe" message, there's no indication as to why the connection was broken. I've seen this type of behavior before. The reason for the disconnection has always lain outside the LDM system itself (e.g., router changes, firewall changes, misbehaving firewalls, broken cables). The last entry above is particularly interesting because it indicates that the upstream LDM on Atm was perfectly happy from 14:07:09 UTC until 14:56:49 UTC when it couldn't flush the connection. Upstream LDM-s send a NULLPROC message down the connection every 30 seconds if they haven't sent a data-product in that time -- this prevents the downstream LDM-s from establishing connections to their upstream hosts more often than necessary for the ISALIVE inquiry that I mentioned earlier. If you didn't receive any data-products from Atm matching the criteria 20030923140112.512 TS_ENDT {{HDS, "(^[A-OQ-X])|(^[YZ].[HIQR])"}} between 14:07:09 UTC and 14:56:49 UTC then Atm probably didn't have any to send you. > Here is an example from today. The most recent IDS|DDPLUS product was > received at 1727 today. Then: > > Sep 24 17:27:55 vortex atm[24309]: ERROR: requester6.c:206: Connection to > upstream LDM closed > Sep 24 17:28:25 vortex atm[24309]: Desired product class: 20030924171035.238 > TS_ENDT {{DIFAX|FNEXRAD|FSL2|UNIWISC|IDS|DDPLUS, ".*"}} > Sep 24 17:28:25 vortex atm[24309]: Connected to upstream LDM-6 > Sep 24 17:28:25 vortex atm[24309]: Upstream LDM is willing to feed > Sep 24 17:40:18 vortex atm[24308]: ERROR: requester6.c:206: Connection to > upstream LDM closed > Sep 24 17:40:48 vortex atm[24308]: Desired product class: 20030924154411.333 > TS_ENDT {{HDS, "(^[A-OQ-X])|(^[YZ].[HIQR])"}} > Sep 24 17:40:51 vortex atm[24308]: Connected to upstream LDM-6 > Sep 24 17:40:51 vortex atm[24308]: Upstream LDM is willing to feed > Sep 24 17:48:41 vortex atm[24309]: ERROR: requester6.c:206: Connection to > upstream LDM closed > Sep 24 17:49:11 vortex atm[24309]: Desired product class: 20030924171035.238 > TS_ENDT {{DIFAX|FNEXRAD|FSL2|UNIWISC|IDS|DDPLUS, ".*"}} > Sep 24 17:49:11 vortex atm[24309]: Connected to upstream LDM-6 > Sep 24 17:49:11 vortex atm[24309]: Upstream LDM is willing to feed > > No more messages appear in the log regarding this feed and up to now > (1838Z), no IDS|DDPLUS products have been received. The corresponding log-entries on Atm are Sep 24 17:25:41 atm.geo.nsf.gov vortex(feed)[29743]: up6.c:331: Starting Up(6.0.14/6): 20030924170902.070 TS_ENDT {{DIFAX|FNEXRAD|FSL2|UNIWISC|IDS|DDPLUS, ".*"}} Sep 24 17:25:41 atm.geo.nsf.gov vortex(feed)[29743]: topo: vortex.esc.brockport.edu DIFAX|FNEXRAD|FSL2|UNIWISC|IDS|DDPLUS Sep 24 17:27:55 atm.geo.nsf.gov vortex(feed)[29743]: up6.c:168: HEREIS: RPC: Unable to send; errno = Broken pipe Sep 24 17:27:55 atm.geo.nsf.gov vortex(feed)[29743]: up6.c:396: Product send failure: I/O error Sep 24 17:28:25 atm.geo.nsf.gov vortex(feed)[481]: up6.c:331: Starting Up(6.0.14/6): 20030924171035.238 TS_ENDT {{DIFAX|FNEXRAD|FSL2|UNIWISC|IDS|DDPLUS, ".*"}} Sep 24 17:28:25 atm.geo.nsf.gov vortex(feed)[481]: topo: vortex.esc.brockport.edu DIFAX|FNEXRAD|FSL2|UNIWISC|IDS|DDPLUS Sep 24 17:40:18 atm.geo.nsf.gov vortex(feed)[16542]: up6.c:168: HEREIS: RPC: Unable to send; errno = Broken pipe Sep 24 17:40:18 atm.geo.nsf.gov vortex(feed)[16542]: up6.c:396: Product send failure: I/O error Sep 24 17:40:51 atm.geo.nsf.gov vortex(feed)[9494]: up6.c:331: Starting Up(6.0.14/6): 20030924154411.333 TS_ENDT {{HDS, "(^[A-OQ-X])|(^[YZ].[HIQR])"}} Sep 24 17:40:51 atm.geo.nsf.gov vortex(feed)[9494]: topo: vortex.esc.brockport.edu HDS Sep 24 17:43:07 atm.geo.nsf.gov vortex(feed)[9494]: up6.c:168: HEREIS: RPC: Unable to send; errno = Broken pipe Sep 24 17:43:07 atm.geo.nsf.gov vortex(feed)[9494]: up6.c:396: Product send failure: I/O error Sep 24 17:43:38 atm.geo.nsf.gov vortex(feed)[10229]: up6.c:331: Starting Up(6.0.14/6): 20030924155158.370 TS_ENDT {{HDS, "(^[A-OQ-X])|(^[YZ].[HIQR])"}} Sep 24 17:43:38 atm.geo.nsf.gov vortex(feed)[10229]: topo: vortex.esc.brockport.edu HDS Sep 24 17:48:41 atm.geo.nsf.gov vortex(feed)[481]: up6.c:168: HEREIS: RPC: Unable to send; errno = Broken pipe Sep 24 17:48:41 atm.geo.nsf.gov vortex(feed)[481]: up6.c:396: Product send failure: I/O error Sep 24 17:49:11 atm.geo.nsf.gov vortex(feed)[13572]: up6.c:331: Starting Up(6.0.14/6): 20030924171035.238 TS_ENDT {{DIFAX|FNEXRAD|FSL2|UNIWISC|IDS|DDPLUS, ".*"}} Sep 24 17:49:11 atm.geo.nsf.gov vortex(feed)[13572]: topo: vortex.esc.brockport.edu DIFAX|FNEXRAD|FSL2|UNIWISC|IDS|DDPLUS Sep 24 17:57:22 atm.geo.nsf.gov vortex(feed)[10229]: up6.c:168: HEREIS: RPC: Unable to send; errno = Broken pipe Sep 24 17:57:22 atm.geo.nsf.gov vortex(feed)[10229]: up6.c:396: Product send failure: I/O error It appears from this that something outside the LDM systems on Atm and Vortex is destroying the connections between the two. > From these 2 examples it appears that the condition that triggers this > problem seems to occur shortly after a new connection is established, but > I can't be sure that this is always the case. > > I want to emphasize that I am aware that it is our network problems that > are causing this, and not the ldm itself. My concern is how the ldm > handles this situation. As you can see from your log-entries, the downstream LDM closes and reconnects very soon after the upstream LDM has gone away because the connection has been somehow destroyed. > As I recall, with the ldm version 5, if there was > a problem with the connection, it would eventually be recognized and it > would time out and the connection reset. (I believe after 12 minutes of > inactivtity.) I suppose the ldm-6 should behave in the same fashion, but > this does not appear to be the case. LDM 6's reconnection strategy is a little different than LDM 5's -- but it should be no less robust and considerably more efficient. > I hope you will excuse some vagueness in my presentation of this problem. No problem. I'm glad to get this report. What we have to do now is figure out why the connections between Vortex and Atm are being destroyed. They're occurring a such a high frequency that it's a wonder you get any data at all. What are the "network problems" of which you spoke? > Tom > ----------------------------------------------------------------------------- > Tom McDermott Email: address@hidden > Systems Administrator Phone: (585) 395-5718 > Earth Sciences Dept. Fax: (585) 395-2416 > SUNY College at Brockport Regards, Steve Emmerson