This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Roy Mendelssohn wrote:
Hi John: Some questions on THREDDS aggregation. 1. Is there a limit to the number of files that can be aggregated over?
nope
2. Can aggregation occur over sub-directories of a directory structure?
using "scan" i assume?supposedly you can have multiple scan directives within the aggregation, but its not well tested. But I would try this if you need this feature.
the scan directive is still pretty primitive, we will continue to improve it, adding a "recurse" tag might be one way. remember that the aggregated files need to be pretty much homogenous.
3. For a lot of time periods, when aggregating fields over time. do you have any feel for the trade-off in speed of aggregation for the size of the netcdf file versus the number of files aggregated over (i.e. if we have 6-hourly data, should we produce 6-hourly files, daily files, weekly files, monthly files - and what would be the likely speed tradeoff if we want to extract a time series of a relatively small region?).
My intuition is that you want to create fewer large files, not lots of little files. It costs the same to open a big or a little file. My current rule of thumb is try to write files that are 50 - 200 Mbytes. In the future, we may add a feature that will try to open all the needed files in different threads. So that may argue for smaller file sizes, but its theoretical at this point.
TIA, -Roy