[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump
- Subject: [netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump
- Date: Tue, 21 Dec 2010 10:35:05 -0700
Hi James,
> We recently began working on a transition from netcdf 3.6.2 to 4.1.1.
>
> The process was trouble free and things seem to be working, but we have
> been surprised to find the HDF variant producing extremely large files
> relative to the old netcdf native form. Our measurement files are already
> enormous, and further growth would be deadly.
>
> Has anyone else encountered this?
There is a larger fixed-size overhead for metadata (names and
properties of variables, dimensions, and attributes) in the HDF5-based
netCDF-4 format, but in our experience, it's not significant for files
with lots of data and only a moderate amount of metadata. And use of
compression can make equivalent netCDF-4 files significantly smaller
than netCDF-3 classic format files.
As an example we use in our netCDF training workshop, a small netCDF
classic format file with only one dimension of size 2 and one variable
that uses that dimension is very small using netCDF classic or 64-bit
offset formats:
88 test.nc1 # classic format
92 test.nc2 # 64-bit -offset format
5072 test.nc3 # netCDF-4 format
5108 test.nc4 # netCDF-4 -classic model format
However, if you change the dimension size to 10000, the sizes are much
closer:
40080 test.nc1 # classic format
40084 test.nc2 # 64-bit -offset format
45064 test.nc3 # netCDF-4 format
45101 test.nc4 # netCDF-4 -classic model format
And if you apply level-1 compression to the variable in the netCDF-4
format, the netCDF-4 file is significantly smaller for this
(artificial) data:
40080 test.nc1 # classic format
40084 test.nc2 # 64-bit -offset format
21055 test.nc3 # netCDF-4 format
21092 test.nc4 # netCDF-4 -classic model format
Finally, if you apply the shuffle filter along with compression for
this test file, the result is significantly better compression:
40080 test.nc1 # classic format
40084 test.nc2 # 64-bit -offset format
7777 test.nc3 # netCDF-4 format
7814 test.nc4 # netCDF-4 -classic model format
It's easy to run little experiments like this with the "nccopy"
utility in the latest netCDF snapshot release (soon to be in version
4.1.2), as you can specify conversions and compression on the command
line:
http://www.unidata.ucar.edu/netcdf/workshops/2010/utilities/NccopyExamples.html
This is a very articficial example and it's unlikely you'll get
results as good with your real data, but experimenting with nccopy's
compression options on some real data could determine what you can
expect in using netCDF 4 for your data.
--Russ
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: AIQ-275071
Department: Support netCDF
Priority: Normal
Status: Closed