This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Oscar, > This settles this issue for me, I have no further questions. Thanks a > lot for your clear explanation. I just want to let you know first that netcdf-4 is not a solution to this problem, in that it won't save you the wasted space, at least for a small example I just tried. The underlying HDF5 format uses a B-tree structure for unlimited variables to keep track of the resulting chunks, and that uses a lot of space. I'll post a table I'm constructing of the space used by all the cases you are considering, and I think it will be clear from that that netcdf-4 would not be suitable to save space in this case, unless you used compression. --Russ > I posted this question on the MathWorks-forum as well (no reply so far); > do you mind if I reply to my own message copy-pasting your answer? I'll > leave the credit to the unidata-support of course. > > cheers,.................Oscar > > > > -----Original Message----- > From: Unidata netCDF Support [mailto:address@hidden] > Sent: Wednesday, 24 March, 2010 19:38 > To: Hartogensis, Oscar > Cc: address@hidden > Subject: [netCDF #AMR-714212]: netcdf file size for limited vs unlimited > > Hi Oscar, > > > Writing multiple 1-dimensional variables (a time-series) to a > > netcdf-file formatted as nc_type "short", I noticed that the file > > becomes twice as large when using an unlimited versus a limited > > dimension definition. > > > > However: > > 1. Writing one variable of nc_type 'short' only, both the limited and > > unlimited dimension files are of the same size... > > 2. Writing all data as floats the limited and unlimited dimension > > nc-files are of equal size (double the size of the limited dimension > > file of type short; as expected). It seems that using multiple > > variables of unlimited dimension means that the data is always written > > > as a float?, or am I doing something wrong? > > Dennis's answer was close, in that you need to know something about the > underlying netCDF-classic format to explain this. The reason is that > the space for each variable's data in a record is padded to the nearest > multiple of 4-bytes. This makes sure each variable's data starts on a > 4-byte boundary, which is an optimization for disk seeks on some > platforms. > > There is a special case if there is only one record variable, in which > case no padding is used for byte or short variables. These padding > rules are documented in the format specification: > > > http://www.unidata.ucar.edu/netcdf/docs/netcdf.html#NetCDF-Classic-Forma > t > > and specifically in the description of the "varslab", which is a > record's worth of data for a single variable, along with the special > note at the end of the specification on padding: > > Note on padding: In the special case of only a single record variable > of character, byte, or short type, no padding is used between data > values. > > As for a way to get around this problem, all I can think of is to use an > extra artifical dimension to make the short variables 2-dimensional, > such as: > > netcdf unlim2 { > dimensions: > time = unlimited; > two = 2; > variables: > short var1(time, two); > short var2(time, two); > data: > var1 = > 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19; > var2 = > 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19; } > > You can still read these values all at once in a contiguous block, and a > small layer of software would let you write the values two-at-a-time, > using a function you would call for each value that would save odd > values and write to the file when it had 2 values. > > --Russ > > > > The files I write are quite large and I need to use an unlimited > > dimension as I don't know the record length in advance (I join > > multiple files into one nectdf file) but I don't like to waste double > > the disk-space my nc-files. > > > I use Matlab to write nc-files and I tried the Matlab-native netcdf > > commands (example below), but also the mexcdf-toolbox and snctools. > > All give the same result. This seems to be more a netcdf than a Matlab > > > issue. Any help is much appreciated though. > > > > > > EXAMPLE1 to illustrate this issue (Matlab native commands): > > %%%%%%%%%%%%%%%%%%%%%%%% > > N=80000; > > > > % LIMITED dimension > > % creating a netcdf file > > nc = netcdf.create('testfile_lim.nc', 'NC_CLOBBER'); % define > > dimension time_dim = netcdf.defDim(nc, 'time', N); % define variables > > var1_id = netcdf.defVar(nc, 'var1', 'short', time_dim); var2_id = > > netcdf.defVar(nc, 'var2', 'short', time_dim); netcdf.endDef(nc); % > > write data netcdf.putVar(nc, var1_id,int16([1:N])); netcdf.putVar(nc, > > var2_id,int16([1:N])); % close nc-file > > netcdf.close(nc) > > > > % UNLIMITED dimension > > % creating a netcdf file > > nc = netcdf.create('testfile_unlim.nc', 'NC_CLOBBER'); % define > > dimension time_dim = netcdf.defDim(nc, 'time', > > netcdf.getConstant('NC_UNLIMITED')); > > % define variables > > var1_id = netcdf.defVar(nc, 'var1', 'short', time_dim); var2_id = > > netcdf.defVar(nc, 'var2', 'short', time_dim); netcdf.endDef(nc); % > > write data netcdf.putVar(nc, var1_id,0,N,int16([1:N])); > > netcdf.putVar(nc, var2_id,0,N,int16([1:N])); % close nc-file > > netcdf.close(nc) > > %%%%%%%%%%%%%%%%%%%%%%%% > > > > testfile_lim.nc => 312kB > > testfile_unlim.nc => 625kB > > > > > > > > EXAMPLE2 to illustrate this issue (mexcdf commands): > > %%%%%%%%%%%%%%%%%%%%%%%% > > N=80000; > > > > nc_lim = netcdf( 'test_lim.nc' , 'clobber'); nc_unlim = netcdf( > > 'test_unlim.nc' , 'clobber'); > > > > nc_lim('time') = N; > > nc_unlim('time') = 0; > > > > nc_lim{'var1'} = ncshort('time'); > > nc_lim{'var2'} = ncshort('time'); > > nc_unlim{'var1'} = ncshort('time'); > > nc_unlim{'var2'} = ncshort('time'); > > > > > > nc_unlim{'var1'}([1:N]) = int16([1:N]); % Store data > > nc_unlim{'var2'}([1:N]) = int16([1:N]); % Store data > > > > nc_lim{'var1'}(:) = int16([1:N]); % Store data > > nc_lim{'var2'}(:) = int16([1:N]); % Store data > > > > close(nc_lim); > > close(nc_unlim);%%%%%%%%%%%%%%%%%%%%%%%% > > > > test_lim.nc => 312kB > > test_unlim.nc => 625kB > > > > > > thanks,............................Oscar Hartogensis > > > > > > --------------------------------------------------------- > > Oscar K Hartogensis > > Meteorology and Air Quality Group > > Wageningen University > > mail: PO Box 47, 6700 AA Wageningen, the Netherlands > > visit: Atlas, building 104, Droevendaalsesteeg 4, > > 6708 PB Wageningen, the Netherlands > > tel: +31 (0)317 482109 > > fax: +31 (0)317 419000 > > email: address@hidden > > url: www.met.wau.nl > > --------------------------------------------------------- > > > > > > Russ Rew UCAR Unidata Program > address@hidden http://www.unidata.ucar.edu > > > > Ticket Details > =================== > Ticket ID: AMR-714212 > Department: Support netCDF > Priority: Normal > Status: Closed > > > > Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: AMR-714212 Department: Support netCDF Priority: Normal Status: Closed