[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: many little files versus one big file
- Subject: Re: many little files versus one big file
- Date: Tue, 26 Apr 1994 08:31:41 -0600
> Keywords: 199404152043.AA21089
Charles,
> > > I have about 1700 files totaling about 2 Mbytes of data. If I load
> > > and process them individually, it takes about 7 seconds. If I
> > > concatenate them together on the unlimited dimension to form one big
> > > file, it takes about 35 seconds to load and process them. Is this
> > > expected? Wouldn't there be less overhead with just one file? I got
> > > the exact same results in both cases so I don't think did anything
> > > terribly wrong.
> >
> > I'm surprised by the times you are seeing, and would expect that
> > accessing the data as one file would require slightly less time than
> > using lots of little files. If you can construct a small test case or
> > script that demonstrates this, I could try to find out the reason for
> > the counterintuitive timings.
> >
> > Are the sums of the sizes of the small files similar to the size of the
> > single large file? I can imagine that since the record dimension
> > requires padding each record out to an even 32-bit boundary, if you were
> > storing only one byte in each record, the record file would require 4
> > times as much storage and more time to access.
> >
> > Another possibility is that you are seeing an artifact of the
> > pre-filling of each record with fill values when the first data is
> > written in each record. This can be avoided by using the ncsetfill()
> > interface (or NCSFIL for Fortran) to specify that records not be
> > pre-filled. See the User's Guide section on "Set Fill Mode for Writes"
> > to find out more about this optimization and when it is appropriate.
>
> Russ,
>
> I might get a chance to construct a test case at some point, but it's
> not easy to isolate. Note that I'm only reading the files. Here are
> some more details:
>
> cnt = 1737 % this many individual files
> sum = 4882464 % total number of bytes (more than single big file)
> ave = 2810.86
> min = 1824
> max = 3872
Since the files are so small, could you send me the output of "ncdump" on
two or three of them? That might be enough information for me to construct
a small test case.
__________________________________________________________________________
Russ Rew UCAR Unidata Program
address@hidden P.O. Box 3000
(303)497-8645 Boulder, Colorado 80307-3000