This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Coy, > I am working on a NetCDF file format using NetCDF-4 libraries but > writing in NetCDF-3 format. In my scenario my URI is based on time, I > believe I must use the same dimensions for each of my data arrays. But > the amount of data I need to store for each time block changes. I have > set up my code to set the dimensions to the largest needed size and > write 'empty' values when data is missing. I believe I have done this > correctly as ncdump shows a dash instead of a number for these missing > values. The problem is that my files are the same size whether I have > lots of missing values or none at all. Is there something I can do to > reduce the size of the file when it contains a fair number of missing > values? I have attached two netcdf files to illustrate this problem. > They are both the same file size, but one contains a fair number of > missing values towards the end. If you can write netCDF-4 classic model format, then you can use compression on variables that have missing values and get smaller files. These files can be read by netCDF-3 applications that have been relinked with a netCDF-4 library and the uncompression will occur transparently, without any changes to the reading program. Reading compressed data will be slower than uncompressed data, but in many cases the smaller size is worth the time, and if there is enough cache allocated, the uncompression will only occur once on the first read of each chunk of data. See these two FAQs for more information: http://www.unidata.ucar.edu/netcdf/docs/faq.html#fv9 You may also want to look at the other FAQs on format variants: http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#formats-and-versions Alternatively, you can use multiple time dimensions. For example, if some variables have data once per second and others have data once per minute, it would be best to use two different time dimensions and associated time coordinate variables for the two data rates. Another alternative would be to use an index for observations and store the time of each observation as data, rather than trying to share a single time dimension. Some examples of this are illustrated in the proposed CF point observation conventions: https://cf-pcmdi.llnl.gov/trac/wiki/PointObservationConventions > On a separate note, I don't always know how big my data array will need > to be when I first create my NetCDF files. Is there a way to expand an > array once it has been created or add a new array? Yes, that's the purpose of declaring the size of a dimension to be "unlimited" when you create it. Data can be efficiently appended along an unlimited dimension for any variable that has a shape that uses that unlimited dimension. NetCDF-3 classic files may only have one unlimited dimension, but that restriction is removed for netCDF-4 files. You can also add a new variable to an existing netCDF file, but unless you have planned for this in advance by allocating extra header space when you first create the file, it can result in an expensive operation of moving all the data to make space. In netCDF-4 files, this is not a problem, and you can efficiently add variables, dimensions, or attributes at any time without the library moving or copying data. For more on this, see the user's guide chapter on file structure and performance: http://www.unidata.ucar.edu/netcdf/docs/netcdf.html#Structure --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: API-226991 Department: Support netCDF Priority: Normal Status: Closed