This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
> Keywords: 199404152043.AA21089 Charles, > > > I have about 1700 files totaling about 2 Mbytes of data. If I load > > > and process them individually, it takes about 7 seconds. If I > > > concatenate them together on the unlimited dimension to form one big > > > file, it takes about 35 seconds to load and process them. Is this > > > expected? Wouldn't there be less overhead with just one file? I got > > > the exact same results in both cases so I don't think did anything > > > terribly wrong. > > > > I'm surprised by the times you are seeing, and would expect that > > accessing the data as one file would require slightly less time than > > using lots of little files. If you can construct a small test case or > > script that demonstrates this, I could try to find out the reason for > > the counterintuitive timings. > > > > Are the sums of the sizes of the small files similar to the size of the > > single large file? I can imagine that since the record dimension > > requires padding each record out to an even 32-bit boundary, if you were > > storing only one byte in each record, the record file would require 4 > > times as much storage and more time to access. > > > > Another possibility is that you are seeing an artifact of the > > pre-filling of each record with fill values when the first data is > > written in each record. This can be avoided by using the ncsetfill() > > interface (or NCSFIL for Fortran) to specify that records not be > > pre-filled. See the User's Guide section on "Set Fill Mode for Writes" > > to find out more about this optimization and when it is appropriate. > > Russ, > > I might get a chance to construct a test case at some point, but it's > not easy to isolate. Note that I'm only reading the files. Here are > some more details: > > cnt = 1737 % this many individual files > sum = 4882464 % total number of bytes (more than single big file) > ave = 2810.86 > min = 1824 > max = 3872 Since the files are so small, could you send me the output of "ncdump" on two or three of them? That might be enough information for me to construct a small test case. __________________________________________________________________________ Russ Rew UCAR Unidata Program address@hidden P.O. Box 3000 (303)497-8645 Boulder, Colorado 80307-3000