This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
> Date: Tue, 10 Dec 1996 12:38:50 -0800 > To: address@hidden > From: Roy Mendelssohn <address@hidden> > Subject: Allocation of Space in NetCDF Hi Roy, > I have a question about how and when NetCDF allocates space. Perhaps it is > answered in the documentation, but I couldn't find it. Suppose I have a > data array, say dimensioned by lat,lon,time and a fourth dimension that is > an unlimited dimension. So we have real numbers, in FORTRAN terms say it > is dimensioned (2,2,2,*), and we have 32-bit integers. > > 1) How big of a file would be created by default (ie I create the > NetCDF file, define all the dimensions etc, define a variable with those > dimensions, but don't actually write any data to the file? The size of the netCDF file specified by the CDL: netcdf r { dimensions: lon = 2; lat = 2; time = 2; rec = unlimited; variables: float var(rec,time,lat,lon); } is 128 bytes, as you can verify by running "ncgen -b" on it. It could be larger if the names of the variables and dimensions were longer. It's possible to glean this from the User's Guide chapter on File Structure and Performance, but it's easier to just run ncgen on the CDL file and look at the size of the generated netCDF file. > 2) This is really the question I have. Suppose now I have one > observation, for convenience it is at location (1,1,1,1). How big of a > file do I have now. If it is what Fortran does, I would have an array that > is (2,2,2,1) so I would increase the file by 8x4bytes = 32 bytes. What > would be ideal if I only increased by the 4 bytes the file size. Sorry, but adding 1 data value increases the size of the file by lat*lon*time*4 bytes, in this case 32 bytes. The smallest increment by which a netCDF file grows is one record's worth of data, which is the amount of space for one slice along the unlimited dimension of all the variables that use the unlimited dimension. > What brings this up is I am thinking of using NetCDF for a dataset where > lat, lon, and time are very large, and the unlimited dimension represents > separate observations for that lat, lon, time coordinate. but the number of > observations will vary greatly depending on the particular lat,lon,time > combination, some even having no observations (i.e. a grid with varying > number of obs at any grid point). If the file size were determined like > Fortran dimensions arrays, it would be huge, many locations would just be > missing data, and it wouldn't be practical. If the storage were the other > way, then it would be very practical. > > Any help, advice etc. would be greatly appreciated. There are various ways to represent such data without wasting space, but you trade off ease of access by location and time. For example, you could let the unlimited dimension be "obsnum" representing observation number, and use something like: netcdf sparse { dimensions: obsnum = unlimited; // observation number variables: float lat(obsnum); float lon(obsnum); float time(obsnum); float var(obsnum); } and now you can have as many (lat,lon,time) tuples as you want, with each observation adding only 16 bytes to the file (four floats), but without an index, it may be costly to find all the observations corresponding to any particular (lat,lon,time) interval. Another approach uses "ragged arrays", similar to what is described in the "Data Structures" section of the manual, available on-line at http://www.unidata.ucar.edu/packages/netcdf/guide_5.html#SEC31 Hope this helps. --Russ _____________________________________________________________________ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu