[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: many little files versus one big file
- Subject: Re: many little files versus one big file
- Date: Mon, 18 Apr 1994 10:13:11 -0600
> Keywords: 199404152043.AA21089
Hi Charles,
> I have about 1700 files totaling about 2 Mbytes of data. If I load
> and process them individually, it takes about 7 seconds. If I
> concatenate them together on the unlimited dimension to form one big
> file, it takes about 35 seconds to load and process them. Is this
> expected? Wouldn't there be less overhead with just one file? I got
> the exact same results in both cases so I don't think did anything
> terribly wrong.
I'm surprised by the times you are seeing, and would expect that accessing
the data as one file would require slightly less time than using lots of
little files. If you can construct a small test case or script that
demonstrates this, I could try to find out the reason for the
counterintuitive timings.
Are the sums of the sizes of the small files similar to the size of the
single large file? I can imagine that since the record dimension requires
padding each record out to an even 32-bit boundary, if you were storing only
one byte in each record, the record file would require 4 times as much
storage and more time to access.
Another possibility is that you are seeing an artifact of the pre-filling of
each record with fill values when the first data is written in each record.
This can be avoided by using the ncsetfill() interface (or NCSFIL for
Fortran) to specify that records not be pre-filled. See the User's Guide
section on "Set Fill Mode for Writes" to find out more about this
optimization and when it is appropriate.
__________________________________________________________________________
Russ Rew UCAR Unidata Program
address@hidden P.O. Box 3000
(303)497-8645 Boulder, Colorado 80307-3000