This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Benno, > MODIS is kept in 317 tiles per timestep, so to make 1 global image, I need > to read 1 tile each from 317 different files, a common structure for GIS > community data (there are 8 or so tiles per file corresponding to different > variables). > netcdf4 is reading all the metadata on open, which means in this case that > all the disk blocks are touched (my guess is that at least some of the > metadata is spread amongst the data blocks). I am not using the metadata at > all (they are tiles, when I read the tiles I already know the structure from > a previous analysis/read of the metadata), so this is a bit of a waste, and > very slow if the metadata is spread throughout the file, as it seems to be. I'm surprised that every disk block is read. That's not typical for netCDF-4 files that use HDF5 as a storage layer. Typically there is only a small amount of metadata, which comprises - the name and size of each dimension - the name, type, and values for each attribute - the name, type, and shape for each variable - the association information linking variables and attributes - other variable properties, such as compression level and layout - information about group names and group links - information about definitions of user-defined types - the B-trees of chunks for each chunked variable Although this information is scattered around the file, it doesn't involve every disk block. Each variable chunk is typically one or more disk blocks that are entirely data, perhaps compressed. Compressed data is not uncompressed until its read, and reading a single tile should not access any of the other tiles, if a tile is one or more chunks. > Since I am reading the entire tile once for a given variable (just not all > the different variables), chunking within the file does not really matter, > and reuse does not happen. Right, chunking is irrelevant in that case. > Aggregation like this is a common use case for netcdf -- not necessarily > common yet for tiles, but certainly common in time. So would you consider > improving the performance in this case where the metadata is not read for > use along with the data? I've discussed this previously with Ed Hartnett, who implemented the HDF5 reading code. It apparently would require some extensive changes to the current code. However, I've created a Jira issue for this, so it's on our list to investigate, and you can follow or comment on the issue here: https://www.unidata.ucar.edu/jira/browse/NCF-132 For now, you would probably be better off reading the MODIS data through the HDF5 library than through the netCDF API. If you try that, I'd be interested in how much the performance improves. --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: IDT-559068 Department: Support netCDF Priority: Normal Status: Closed