This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Jeff, > I made the changes you suggested with the following result: > > 10000 records, 8 bytes / record = 80000 bytes raw data > > original program (NetCDF4, no chunking): 537880 bytes (6.7x) > file size with chunk size of 2000 = 457852 bytes (5.7x) > > So a little better, but still not good. I then tried different chunk sizes > of 10000, 5000, 200, and even 1, which I would've thought would give me the > original size, but all gave the same resulting file size of 457852. Changing the chunking in a C program using the netCDF C library, version 4.3.2, shows the expected improvement using larger chunk sizes: file size with chunk size of 1: 457869 bytes file size with chunk size of 2000: 82685 bytes file size using classic format: 80140 bytes The netCDF-Java library uses the netCDF C library when writing netCDF-4 files, but I'm not sure which netCDF C version it uses. All versions of the netCDF-C library after 4.2.0 (released July 2012) show the same file size using a chunk size of 2000, 82685 bytes. So it appears that there's something wrong with the way the chunk size is being set from the Java test program you're using ... --Russ > Finally, I tried writing more records to see if it's just a symptom of a > small data set. With 1M records: > > 8MB raw data, chunk size = 2000 > 45.4MB file (5.7x) > > This is starting to seem like a lost cause given our small data records. > I'm wondering if you have information I could use to go back to the archive > group and try to convince them to use NetCDF3 instead. > > jeff > > > > address@hidden> wrote: > > > Great, thanks Ethan, I'll give that a try. We have an external requirement > > being imposed on us to use NetCDF4, but I don't know the reasoning behind > > it. > > > > jeff > > > > > > address@hidden> wrote: > > > >> Hi Jeff, > >> > >> An alternate approach would be to avoid the whole chunking issue by > >> writing netCDF-3 files instead of netCDF-4 files. But, if you do that, you > >> don't get to take advantage of compression. > >> > >> If you want to stick with netCDF-4, I included a few details and pointers > >> to the needed methods and classes in my response on the netcdf-java list > >> > >> > >> http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2014/msg00055.html > >> > >> But I think it boils down to a few lines of code. I haven't tested this > >> but interspersed in your code below are a few lines that I think should get > >> you going. > >> > >> Ethan > >> > >> On 5/2/2014 8:54 AM, Jeff Johnson - NOAA Affiliate wrote:> New Ticket: > >> chunking in Java > >> > > >> > How do you set the chunk size via the Java API? I'm trying to get my > >> file > >> > size down and was told by Ethan from unidata to change the chunk size > >> from > >> > 1 to 2000 on the unlimited dimension, but I don't see an API to do that. > >> > > >> > Below is my sample code. > >> > > >> > import ucar.ma2.ArrayDouble; > >> > import ucar.ma2.ArrayLong; > >> > import ucar.ma2.DataType; > >> > import ucar.ma2.InvalidRangeException; > >> > import ucar.nc2.*; > >> > > >> > import java.io.IOException; > >> > import java.nio.file.FileSystems; > >> > import java.nio.file.Files; > >> > import java.nio.file.Path; > >> > import java.util.ArrayList; > >> > import java.util.List; > >> > > >> > public class TestGenFile2 { > >> > public static void main(String[] args) { > >> > NetcdfFileWriter dataFile = null; > >> > > >> > try { > >> > try { > >> > > >> > // define the file > >> > String filePathName = "output.nc"; > >> > > >> > // delete the file if it already exists > >> > Path path = FileSystems.getDefault().getPath(filePathName); > >> > Files.deleteIfExists(path); > >> > > >> > // enter definition mode for this NetCDF-4 file > >> > dataFile = > >> > NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4, > >> filePathName); > >> > >> replace the above line with > >> > >> Nc4Chunking chunkingStrategy = > >> Nc4ChunkingStrategyImpl.factory(Nc4Chunking.Strategy.standard, 0, false); > >> NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4, > >> filePathName, chunkingStrategy); > >> > >> > >> > // create the root group > >> > Group rootGroup = dataFile.addGroup(null, null); > >> > > >> > // define dimensions, in this case only one: time > >> > Dimension timeDim = dataFile.addUnlimitedDimension("time"); > >> > List<Dimension> dimList = new ArrayList<>(); > >> > dimList.add(timeDim); > >> > > >> > // define variables > >> > Variable time = dataFile.addVariable(rootGroup, "time", > >> > DataType.DOUBLE, dimList); > >> > dataFile.addVariableAttribute(time, new Attribute("units", > >> > "milliseconds since 1970-01-01T00:00:00Z")); > >> > >> Add the following line here > >> > >> dataFile.addVariableAttribute(time, new Attribute("_ChunkSize", new > >> Integer( 2000))); > >> > >> > >> > > >> > // create the file > >> > dataFile.create(); > >> > > >> > // create 1-D arrays to hold data values (time is the dimension) > >> > ArrayDouble.D1 timeArray = new ArrayDouble.D1(1); > >> > > >> > int[] origin = new int[]{0}; > >> > long startTime = 1398978611132L; > >> > > >> > // write the records to the file > >> > for (int i = 0; i < 10000; i++) { > >> > // load data into array variables > >> > double value = startTime++; > >> > timeArray.set(timeArray.getIndex(), value); > >> > > >> > origin[0] = i; > >> > > >> > // write a record > >> > dataFile.write(time, origin, timeArray); > >> > } > >> > } finally { > >> > if (null != dataFile) { > >> > // close the file > >> > dataFile.close(); > >> > } > >> > } > >> > } catch (IOException | InvalidRangeException e) { > >> > e.printStackTrace(); > >> > } > >> > } > >> > } > >> > > >> > thanks- > >> > jeff > >> > > >> > >> > >> Ticket Details > >> =================== > >> Ticket ID: BNA-191717 > >> Department: Support netCDF Java > >> Priority: Normal > >> Status: Closed > >> > >> > > > > > > -- > > Jeff Johnson > > DSCOVR Ground System Development > > Space Weather Prediction Center > > address@hidden > > 303-497-6260 > > > > > > -- > Jeff Johnson > DSCOVR Ground System Development > Space Weather Prediction Center > address@hidden > 303-497-6260 > > Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: BNA-191717 Department: Support netCDF Priority: Normal Status: Closed