This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Don, Actually I think the former (a method to read raw data) is better than the latter (not setting missing data metadata) because I still need a method to do the unpacking etc on my data points of interest. This needs the invalidDataMissing, fillValueMissing attributes to be set, but I choose when to apply them, rather than them being applied on every single data point that is read. Regards, Jon On 27/10/06, Don Murray <address@hidden> wrote:
Hi Jon- Thanks for the explanation. It sounds like a method to read the raw data would be useful or better yet a constructor to GeoGrid that would take a boolean for not setting missing data (akin to all the setInvalidDataMissing(), setFillValueMissing() methods), but still allow the coordinate system enhancements. Don Jon Blower wrote: > Hi Don, > > The problem is caused by my use of the nj22 library. In my > application I need to create an image from a NetCDF file as quickly as > possible. The image will often be of much lower resolution than the > source data, but will not necessarily be in the same coordinate > reference system. > > If I want to create a 100x100 image, I need to read at least 10,000 > data points. However, reading 10,000 individual points appears to be > very slow (especially for an NcML aggregation) so I am compromising by > reading chunks of contiguous data at a time. This means that I often > end up reading considerably more data than I need to make the image. > I perform the necessary interpolation in my application and throw away > the unwanted data. > > If I read packed data using an "enhanced" variable, then every single > point is internally checked to see if it is a missing value, and every > single point is unpacked (scale and offset applied). Through > profiling, I established this to be an expensive operation because it > is being applied to many more data points than I need. Therefore I > employed a method whereby data are read in their packed form, without > being checked for missing values. I then perform the check just for > the 10,000 points that I need to plot in my image. This is > considerably and demonstrably faster, although as with all > optimisation problems, it's a compromise. > > Does this clear things up? As far as changes to the libraries go, it > would be handy to have a method in GeoGrid for reading "raw" (packed) > data as fast as possible, and giving the user the opportunity to > unpack the data later. > > Best wishes, > Jon > > On 27/10/06, Don Murray <address@hidden> wrote: >> Jon and John- >> >> Why is it so much slower using the GeoGrid directly? Perhaps >> there can be some performance tuning on the GeoGrid side to >> avoid people having to jump through the hoops that Jon is? >> Is it because the GeoGrid scales and offsets the entire grid >> before subsetting instead of subsetting and then scale and >> offset (which seems to be what Jon ends up doing). Jon, >> when you say you are scaling and offsetting only the individual >> values, is this all the values in the subset or if not, what >> percentage of the subset are you doing this on? >> >> We've been doing some profiling of the netcdf-java reading >> in the IDV and if this is an area where we could get some >> performance enhancements, I'd like to implement something >> in the IDV. >> >> Don >> >> Jon Blower wrote: >> > Hi John (cc list), >> > >> > Thanks for you help - I found a solution that works well in my app. >> > As you suggested, I open the dataset without enhancement, then added >> > the coordinate systems: >> > >> > nc = NetcdfDataset.openDataset(location, false, null); >> > // Add the coordinate systems >> > CoordSysBuilder.addCoordinateSystems(nc, null); >> > GridDataset gd = new GridDataset(nc); >> > GeoGrid geogrid = gd.findGridByName(varID); >> > >> > I then create an EnhanceScaleMissingImpl: >> > >> > EnhanceScaleMissingImpl enhanced = new >> > EnhanceScaleMissingImpl((VariableDS)geogrid.getVariable()); >> > >> > (Unfortunately this class is package-private so I made a copy from the >> > source code in my local directory. Could this class be made public >> > please?) >> > >> > This means that when I read data using geogrid.subset() it does not >> > check for missing values or unpack the data and is therefore quicker. >> > I then do enhanced.convertScaleOffsetMissing() only on the individual >> > values I need to work with. Seems to work well and is pretty quick. >> > Is there anything dangerous in the above? >> > >> > Thanks again, >> > Jon >> > >> > >> > On 26/10/06, John Caron <address@hidden> wrote: >> >> Hi Jon: >> >> >> >> Jon Blower wrote: >> >> > Hi John, >> >> > >> >> > I need some of the functionality of a GridDataset to allow me to >> read >> >> > coordinate system information. Also, I might be opening an NcML >> >> > aggregation. Is it sensible to use >> NetcdfDataset.getReferencedFile()? >> >> > In the case of an NcML aggregation, is it possible to get a >> handle to >> >> > a specific NetcdfFile (given relevant information such as the >> >> > timestep)? >> >> >> >> You are getting into the internals, so its a bit dangerous. >> >> >> >> I think this will work: >> >> >> >> NetcdfDataset ncd = openDataset(String location, false, null); // >> >> dont enhance >> >> ucar.nc2.dataset.CoordSysBuilder.addCoordinateSystems(ncd, null); // >> >> add coord info >> >> GridDataset gds = new GridDataset( ncd); // make into a grid >> >> >> >> BTW, you will want to switch to the new GridDataset in >> >> ucar.nc2.dt.grid when you start using 2.2.17. It should be compatible, >> >> let me know. >> >> >> >> >> >> > >> >> > On a related note, is it efficient to read data from a NetcdfFile >> (or >> >> > NetcdfDataset) point-by-point? I have been assuming that reading >> >> > contiguous chunks of data is more efficient than reading individual >> >> > points, even if it means reading more data than I actually need, but >> >> > perhaps this is not the case? Unfortunately I'm not at my usual >> >> > computer so I can't do a quick check myself. If reading data >> >> > point-by-point is efficient (enough) my problem goes away. >> >> >> >> It depends on data locality. If the points are close together on disk, >> >> then they will likely to be already in the random access file buffer. >> >> The bigger the buffer the more likely, you can try different buffer >> >> sizes with: >> >> >> >> NetcdfDataset openDataset(String location, boolean enhance, int >> >> buffer_size, ucar.nc2.util.CancelTask cancelTask, Object spiObject); >> >> >> >> >> >> >> >> > >> >> > Thanks, Jon >> >> > >> >> > On 26/10/06, John Caron <address@hidden> wrote: >> >> > >> >> >> Hi Jon: >> >> >> >> >> >> One obvious thing would be to open it as a NetcdfFile, not a >> >> >> GridDataset. Is that a possibility? >> >> >> >> >> >> Jon Blower wrote: >> >> >> > Hi, >> >> >> > >> >> >> > I'm writing an application that reads data from NetCDF files and >> >> >> > produces images. I've noticed (through profiling) that a slow >> point >> >> >> > in the data reading process is the unpacking of packed data (i.e. >> >> >> > applying scale and offset) and checking for missing values. I >> would >> >> >> > like to minimize the use of these calls. >> >> >> > >> >> >> > To cut a long post short, I would like to find a low-level >> function >> >> >> > that allows me to read the packed data, exactly as they appear in >> >> the >> >> >> > file. I can then "manually" apply the unpacking and >> missing-value >> >> >> > checks only to those data points that I need to display. >> >> >> > >> >> >> > I'm using nj22, version 2.2.16. I've tried reading data from >> >> >> > GeoGrid.subset() but this (of course) performs the unpacking. I >> >> then >> >> >> > tried getting the "unenhanced" variable object through >> >> >> > GeoGrid.getVariable().getOriginalVariable(), but >> (unexpectedly) this >> >> >> > also seems to perform unpacking and missing-value checks - I >> >> expected >> >> >> > it to give raw data. >> >> >> > >> >> >> > Can anyone help me to find a function for reading raw (packed) >> data >> >> >> > without performing missing-value checks? >> >> >> > >> >> >> > Thanks in advance, >> >> >> > Jon >> >> >> > >> >> >> >> >> >> >> >> >> =============================================================================== >> >> >> >> >> >> >> >> >> To unsubscribe netcdf-java, visit: >> >> >> http://www.unidata.ucar.edu/mailing-list-delete-form.html >> >> >> >> >> >> =============================================================================== >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> > >> >> >> > >> > >> >> -- >> ************************************************************* >> Don Murray UCAR Unidata Program >> address@hidden P.O. Box 3000 >> (303) 497-8628 Boulder, CO 80307 >> http://www.unidata.ucar.edu/staff/donm >> ************************************************************* >> >> >> > > -- ************************************************************* Don Murray UCAR Unidata Program address@hidden P.O. Box 3000 (303) 497-8628 Boulder, CO 80307 http://www.unidata.ucar.edu/staff/donm *************************************************************
-- -------------------------------------------------------------- Dr Jon Blower Tel: +44 118 378 5213 (direct line) Technical Director Tel: +44 118 378 8741 (ESSC) Reading e-Science Centre Fax: +44 118 378 6413 ESSC Email: address@hidden University of Reading 3 Earley Gate Reading RG6 6AL, UK -------------------------------------------------------------- =============================================================================== To unsubscribe netcdf-java, visit: http://www.unidata.ucar.edu/mailing-list-delete-form.html ===============================================================================