[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump
- Subject: [netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump
- Date: Wed, 22 Dec 2010 10:58:09 -0700
> Thanks for the reply. If the difference were metadata, wouldn't we
> expect to see the greatest difference between the netcdf-3 firnat
> and HDF with smaller data files? In fact, we're finding the
> opposite.
Yes, if you only have a moderate amount of metadata and lots of data,
HDF5 files would be much larger with a small amount of data but similar
in size with a large amount of data.
If, however, you had lots of metadata (for example 5000 variables and
5000 dimensions), then the HDF5 files might appear significantly larger
even with lots of data.
> We would like to share some larger data files with you guys in
> order to better understand the situation. Would you be willing to
> pick some data up from our ftp site?
Yes, that would be useful.
--Russ
> > Hi James,
> >
> >> We recently began working on a transition from netcdf 3.6.2 to 4.1.1.
> >>
> >> The process was trouble free and things seem to be working, but we have
> >> been surprised to find the HDF variant producing extremely large files
> >> relative to the old netcdf native form. Our measurement files are
> >> already
> >> enormous, and further growth would be deadly.
> >>
> >> Has anyone else encountered this?
> >
> > There is a larger fixed-size overhead for metadata (names and
> > properties of variables, dimensions, and attributes) in the HDF5-based
> > netCDF-4 format, but in our experience, it's not significant for files
> > with lots of data and only a moderate amount of metadata. And use of
> > compression can make equivalent netCDF-4 files significantly smaller
> > than netCDF-3 classic format files.
> >
> > As an example we use in our netCDF training workshop, a small netCDF
> > classic format file with only one dimension of size 2 and one variable
> > that uses that dimension is very small using netCDF classic or 64-bit
> > offset formats:
> >
> > 88 test.nc1 # classic format
> > 92 test.nc2 # 64-bit -offset format
> > 5072 test.nc3 # netCDF-4 format
> > 5108 test.nc4 # netCDF-4 -classic model format
> >
> > However, if you change the dimension size to 10000, the sizes are much
> > closer:
> >
> > 40080 test.nc1 # classic format
> > 40084 test.nc2 # 64-bit -offset format
> > 45064 test.nc3 # netCDF-4 format
> > 45101 test.nc4 # netCDF-4 -classic model format
> >
> > And if you apply level-1 compression to the variable in the netCDF-4
> > format, the netCDF-4 file is significantly smaller for this
> > (artificial) data:
> >
> > 40080 test.nc1 # classic format
> > 40084 test.nc2 # 64-bit -offset format
> > 21055 test.nc3 # netCDF-4 format
> > 21092 test.nc4 # netCDF-4 -classic model format
> >
> > Finally, if you apply the shuffle filter along with compression for
> > this test file, the result is significantly better compression:
> >
> > 40080 test.nc1 # classic format
> > 40084 test.nc2 # 64-bit -offset format
> > 7777 test.nc3 # netCDF-4 format
> > 7814 test.nc4 # netCDF-4 -classic model format
> >
> > It's easy to run little experiments like this with the "nccopy"
> > utility in the latest netCDF snapshot release (soon to be in version
> > 4.1.2), as you can specify conversions and compression on the command
> > line:
> >
> >
> > http://www.unidata.ucar.edu/netcdf/workshops/2010/utilities/NccopyExamples.html
> >
> > This is a very articficial example and it's unlikely you'll get
> > results as good with your real data, but experimenting with nccopy's
> > compression options on some real data could determine what you can
> > expect in using netCDF 4 for your data.
> >
> > --Russ
> >
> > Russ Rew UCAR Unidata Program
> > address@hidden http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: AIQ-275071
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
>
>
>
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: AIQ-275071
Department: Support netCDF
Priority: Normal
Status: Closed