[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump
- Subject: [netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump
- Date: Wed, 22 Dec 2010 13:41:17 -0700
James,
> I am advised that you should be able to get the following via
> anonymous ftp:
>
>
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.fnc> Fluid_Meas.fnc
>
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.fnc-nccopy-k3>
> Fluid_Meas.fnc-nccopy-k3
>
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.snc> Fluid_Meas.snc
>
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.snc-nccopy-k3>
> Fluid_Meas.snc-nccopy-k3
Thanks, I see what you mean! We'll have to investigate why the
netCDF-4 copies of these netCDF classic format files are so much
larger than expected (e.g. 42 MB classic file but 96 MB netCDF-4
file, and ncdump shows not a lot of metadata). I don't currently
have an explanation, but it could be a bug.
--Russ
> >> Thanks for the reply. If the difference were metadata, wouldn't we
> >> expect to see the greatest difference between the netcdf-3 firnat
> >> and HDF with smaller data files? In fact, we're finding the
> >> opposite.
> >
> > Yes, if you only have a moderate amount of metadata and lots of data,
> > HDF5 files would be much larger with a small amount of data but similar
> > in size with a large amount of data.
> >
> > If, however, you had lots of metadata (for example 5000 variables and
> > 5000 dimensions), then the HDF5 files might appear significantly larger
> > even with lots of data.
> >
> >> We would like to share some larger data files with you guys in
> >> order to better understand the situation. Would you be willing to
> >> pick some data up from our ftp site?
> >
> > Yes, that would be useful.
> >
> > --Russ
> >
> >> > Hi James,
> >> >
> >> >> We recently began working on a transition from netcdf 3.6.2 to 4.1.1.
> >> >>
> >> >> The process was trouble free and things seem to be working, but we
> >> have
> >> >> been surprised to find the HDF variant producing extremely large
> >> files
> >> >> relative to the old netcdf native form. Our measurement files are
> >> >> already
> >> >> enormous, and further growth would be deadly.
> >> >>
> >> >> Has anyone else encountered this?
> >> >
> >> > There is a larger fixed-size overhead for metadata (names and
> >> > properties of variables, dimensions, and attributes) in the HDF5-based
> >> > netCDF-4 format, but in our experience, it's not significant for files
> >> > with lots of data and only a moderate amount of metadata. And use of
> >> > compression can make equivalent netCDF-4 files significantly smaller
> >> > than netCDF-3 classic format files.
> >> >
> >> > As an example we use in our netCDF training workshop, a small netCDF
> >> > classic format file with only one dimension of size 2 and one variable
> >> > that uses that dimension is very small using netCDF classic or 64-bit
> >> > offset formats:
> >> >
> >> > 88 test.nc1 # classic format
> >> > 92 test.nc2 # 64-bit -offset format
> >> > 5072 test.nc3 # netCDF-4 format
> >> > 5108 test.nc4 # netCDF-4 -classic model format
> >> >
> >> > However, if you change the dimension size to 10000, the sizes are much
> >> > closer:
> >> >
> >> > 40080 test.nc1 # classic format
> >> > 40084 test.nc2 # 64-bit -offset format
> >> > 45064 test.nc3 # netCDF-4 format
> >> > 45101 test.nc4 # netCDF-4 -classic model format
> >> >
> >> > And if you apply level-1 compression to the variable in the netCDF-4
> >> > format, the netCDF-4 file is significantly smaller for this
> >> > (artificial) data:
> >> >
> >> > 40080 test.nc1 # classic format
> >> > 40084 test.nc2 # 64-bit -offset format
> >> > 21055 test.nc3 # netCDF-4 format
> >> > 21092 test.nc4 # netCDF-4 -classic model format
> >> >
> >> > Finally, if you apply the shuffle filter along with compression for
> >> > this test file, the result is significantly better compression:
> >> >
> >> > 40080 test.nc1 # classic format
> >> > 40084 test.nc2 # 64-bit -offset format
> >> > 7777 test.nc3 # netCDF-4 format
> >> > 7814 test.nc4 # netCDF-4 -classic model format
> >> >
> >> > It's easy to run little experiments like this with the "nccopy"
> >> > utility in the latest netCDF snapshot release (soon to be in version
> >> > 4.1.2), as you can specify conversions and compression on the command
> >> > line:
> >> >
> >> >
> >> > http://www.unidata.ucar.edu/netcdf/workshops/2010/utilities/NccopyExamples.html
> >> >
> >> > This is a very articficial example and it's unlikely you'll get
> >> > results as good with your real data, but experimenting with nccopy's
> >> > compression options on some real data could determine what you can
> >> > expect in using netCDF 4 for your data.
> >> >
> >> > --Russ
> >> >
> >> > Russ Rew UCAR Unidata Program
> >> > address@hidden http://www.unidata.ucar.edu
> >> >
> >> >
> >> >
> >> > Ticket Details
> >> > ===================
> >> > Ticket ID: AIQ-275071
> >> > Department: Support netCDF
> >> > Priority: Normal
> >> > Status: Closed
> >> >
> >>
> >>
> >>
> >
> > Russ Rew UCAR Unidata Program
> > address@hidden http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: AIQ-275071
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
>
>
>
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: AIQ-275071
Department: Support netCDF
Priority: Normal
Status: Closed