This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Geoffry, > Your clarifications help a lot. Though I still have some questions about > machine failures and where a replacement node would 'start' at in the data. When a downstream LDM starts, it requests products starting from some time in the past (typically one hour) or from the last successfully-received product (which it tracks) -- whichever is most recent. Thus, if the downstream site is offline for less than one hour or the minimum residency time of the upstream site's product-queue (whichever is less), then no products will be lost. > Allow me to elaborate on my usecase: I'm planning to download large amounts > of weather data, spreading the load across many 'nodes.' Nodes in this case > are AWS instances. At the scale we're looking at, instance ('machine' > failure) with loss of any LDM state is to be expected relatively > frequently. If there's some piece of LDM non-memory state that needs to be > persisted between machine failures to guarantee delivery, I need to be > aware. > > I am evaluating, if we run LDM naively what are our failure conditions when > > - we lose a machine, all it's local state, and disk > - an LDM process dies > - network partitions or failures between or during transfers > > And how we might structure our LDM cluster to avoid any related problems. > I'm also looking to better understand the implementation of LDM so I can be > aware of our upstream providers (paid Universities) potential failure cases > and how they impact our cluster's ability to always successfully receive > and process files. (ignoring network partitions > ~45mins, failure of more > than some set number of redundant nodes, and the data being unavailable to > the LDM network) > > So, basically, trying to figure out how strong the processing guarantees > that LDM provides are so I know where I need to add extra > monitoring/coordination between redundant nodes. Sounds like you might be interested in the section on LDM clusters in the reference manual <https://www.unidata.ucar.edu/software/ldm/ldm-current/basics/cluster.html>. Regards, Steve Emmerson Ticket Details =================== Ticket ID: HEQ-649192 Department: Support LDM Priority: Normal Status: Closed