This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Joey, > NCEP/NCO is in the process of evaluating the ldm for sending large > quantities of data to NCDC for archival purposes. Folks have > recommended that I receive input from both of you as you have a great > deal of experience in dealing with the ldm software. > > Do either of you have a feel on what the limits are for data processing > via the ldm? Is there a line in the sand that should not be crossed? > Are there issues with large files as compared to small files? At what > data extreme would you not feel comfortable using the ldm? What are the > limiting factors? (Memory, hard drive space...etc) > > Thanks for you input. > > Joey We've received this question a number of times. We can tell you some general things to consider and provide you with some examples. Probably the most critical aspect of the LDM system is the size of the product-queue and the amount of physical memory. Ideally, you should have enough physical memory to hold the product-queue in memory all the time. This means that the amount of physical memory should be at least 30% greater than the size of the product-queue (with more memory being better, of course). The minimum size of the product-queue can be determined by multiplying the maximum rate at which data arrives by the minimum time that data should be available in the product-queue (typically one hour). Note that some systems cannot support product-queues larger than about 2 or 4 gigabytes because either the "off_t" or "size_t" data-type is a 32 bit integer, respectively. Processing data-products on the same system that receives them can slow things down considerably due to contention for the CPU. Consequently, the topology of an LDM network can greatly affect performance. One of the things that we recommend for systems that do a lot of processing is to either have multiple systems each receiving and processing a subset of the data or have a single system that receives all data and multiple, other systems that each process a subset of the data that they request from the front-end system. Each LDM child process that receives data executes an exclusive write-lock on the product-queue in order to store the data. Consequently, the more REQUEST entries that you have in the LDM configuration-file, the more contention will exist for writing into the product-queue. Interestingly enough, this tendency towards poorer performance with increased number of REQUEST entries is compensated, more or less, by multiple TCP connections between the same hosts having higher net throughput than a single TCP connection (depending on the details of the TCP implementation). Consequently, exactly how many REQUEST entries are too many depends on your exact circumstances (see the examples, below). And, of course, no LDM can receive data faster than the usable bandwidth of its network connection. Regarding the size of data-products: the LDM can handle products ranging from 1 byte to about 2 gigabytes equally easily. I would set the size of a data-product based on some natural division within the data-product itself rather than on some arbitrary value (remember, each data-product must have a unique, textual product-identifier). For example, a natural division for NEXRAD Level-II radar data is a single, multi-parameter horizontal sweep. And now for some examples. Our quasi-operational system at the Unidata Program Center is described at http://www.unidata.ucar.edu/committees/usercom/200604meeting/ldm-idd-status.html under the heading "IDD Relay Cluster idd.unidata.ucar.edu" (follow the links for more information). Currently, the two "accumulator" systems receive about 3.3 gigabytes of data per hour in about 155,000 data-products split over 18 or 19 REQUEST entries. The outbound data-rate has been stress-tested to about 76 gigabytes per hour, aggregate, over about 270 connections. The LDM on host ultrazone.ucar.edu, which is part of the TIGGE project <http://tigge.ucar.edu>, currently receives about 10 gigabytes per hour on 71 connections. The host is a SUN SPARC Sun-Fire-V890 with, I think, 18 GB of memory, a 14 GB product-queue and running SunOS 5.10. If I may ask, how much data are you planning on transmitting, on what sort of network, and with what sort of hardware budget? Regards, Steve Emmerson Ticket Details =================== Ticket ID: SYM-524686 Department: Support LDM Priority: Normal Status: Closed