This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
> Just to give you an example of what its doing, I logged the file size for the > command below as a function of time. see attached figure. It continues at > this low speed until I grow impatient.... :-) That doesn't look like O(n**2) behavior for a bad algorithm, it looks like thrashing behavior when close to resource exhaustion (in this case memory). I'll be interested if you can reproduce the examples I tried with "nccopy -m 1G -h 2G ...", which should use enough less than your 8 GB machine has to not incur thrashing, unless your machine is doing lots of other processing ... --Russ > > Hi Mark, > > > After a holiday and a break from this work, I was finally able to have a > > look at it again. Unfortunately, the fix doesn't seem to work for me :-( It > > is still the same problem as previously - the copy starts out fine, but > > gets progressively slower and slower, and ultimately "hangs". Here is the > > command that I am using: > > > > ./nccopy -u -k3 -d1 -m 2G -h 18G -e 10001 -c > > time/1698,longitude/6,latitude/7 combined_snapshots.nc > > temporal_read_optimised.nc > > > > I am wondering whether I am setting the -h and -e options correctly? How > > should these be set? I'm not sure I understand the difference between them. > > It looks to me like you are setting -h correctly. Since you are using > smaller sizes for the longitude and latitude dimensions than I > used in my tests (6 and 7 versus 24 and 25), you will have about 14.3 times > as many chunks as I used (I was aiming at each chunk > being about 4 MB), so you could set the number of elements in the chunk cache > higher (61446 instead of 4301). I had used 10001 > elements in the chunk cache to be generously larger than 4301, but I think > it's not too critical as long as the number of elements > in the chunk cache is larger than the number of cache elements you need. > Since you are compressing the data and reordering it in > a way that requires *all* the chunks in memory at once, you need to use at > least "-e 61446", and to be generous should probably > use something like "-e 61500". The HDF5 documentation recommends that the > number of elements in the chunk cache should be > prime, but I don't see the necessity for that and haven't noticed any > difference whether it's prime or composite. With the current > setting of "-e 1001", chunks that are only partly written will have to be > ejected from the cache to make room for new chunks, and > this will lead to lots of unnecessary recompressing of chunks that are > ejected before writing them to disk, as well as uncompressing > partially written chunks when reading them into the chunk cache. > > You also need to make sure that your computer has enough memory to hold the > chunk cache in memory. You've specified a 2GB > input buffer and 18GB of chunk cache memory, so you should have at least 20GB > of memory for nccopy to run, keeping > the data in the chunk cache uncompressed while reordering it. You might get > by with a smaller input buffer, say 11MB (one time of > 1617*1596*4 bytes) and a somewhat smaller chunk cache, "-h 17.53G", if you're > close to the maximum. > > > The combined_snapshots.nc file is 630MB - a dump of the header is given > > below: > > My tests have been with simulated data of the same size as you're using, but > my simulated data may compress better than yours. > If you could possibly make your actual combined_snapshots.nc file available > somewhere for me to test nccopy on the actual data, > I could make sure I can reproduce something like the 15 minute times I'm > seeing for the copy and rechunking. It may be your > use of 1698x7x6 chunks requires more time than the larger 1698x25x24 chunks I > was writing, so I could try that as well. > > > Any ideas? > > I really can't explain what looks like the O(n**2) behavior you seem to be > seeing in writing the output, unless it's something in > the HDF5 layer involving a performance bug in the B-trees that index the > chunks. You can't really judge the progress in writing > the output file by the size of the output, as none of the chunks are complete > until the end of the copy. So the output file should > stay fairly small until all of the chunks are flushed to disk (while being > compressed) at the end of the rechunking. > > Also the -h and -e options to nccopy have only been minimally tested, and > there could still be bugs ... > > --Russ > > > [mpayne@oleander compiler]$ ncdump combined_snapshots.nc -h -c > > netcdf combined_snapshots { > > dimensions: > > latitude = 1617 ; > > longitude = 1596 ; > > time = UNLIMITED ; // (1698 currently) > > variables: > > float chl_oc5(time, latitude, longitude) ; > > chl_oc5:_FillValue = 0.f ; > > chl_oc5:long_name = "Chlorophyll-a concentration in sea water using the OC5 > > algorithm" ; > > chl_oc5:standard_name = "mass_concentration_of_chlorophyll_a_in_sea_water" ; > > chl_oc5:grid_mapping = "mercator" ; > > chl_oc5:units = "milligram m-3" ; > > chl_oc5:missing_value = 0.f ; > > chl_oc5:units_nonstandard = "mg m^-3" ; > > float latitude(latitude) ; > > latitude:_FillValue = -999.f ; > > latitude:standard_name = "latitude" ; > > latitude:long_name = "latitude" ; > > latitude:valid_min = -90. ; > > latitude:units = "degrees_north" ; > > latitude:valid_max = 90. ; > > latitude:axis = "Y" ; > > float longitude(longitude) ; > > longitude:_FillValue = -999.f ; > > longitude:standard_name = "longitude" ; > > longitude:long_name = "longitude" ; > > longitude:valid_min = -180. ; > > longitude:units = "degrees_east" ; > > longitude:valid_max = 180. ; > > longitude:axis = "X" ; > > int mercator ; > > mercator:false_easting = 0L ; > > mercator:standard_parallel = 0L ; > > mercator:grid_mapping_name = "mercator" ; > > mercator:false_northing = 0L ; > > mercator:longitude_of_projection_origin = 0L ; > > double time(time) ; > > time:_FillValue = -1. ; > > time:time_origin = "1970-01-01 00:00:00" ; > > time:valid_min = 0. ; > > time:long_name = "time" ; > > time:standard_name = "time" ; > > time:units = "seconds since 1970-01-01 00:00:00" ; > > time:calendar = "gregorian" ; > > time:axis = "T" ; > > > > // global attributes: > > :site_name = "UK Shelf Seas" ; > > :citation = "If you use this data towards any publication, please > > acknowledge this using: \'The authors thank the NERC Earth Observation Data > > Acquisition and Analysis Service (NEODAAS) for supplying data for this > > study\' and then email NEODAAS (address@hidden) with the details. The > > service relies on users\' publications as one measure of success." ; > > :creation_date = "Thu Jun 02 10:51:37 2011" ; > > :easternmost_longitude = 13. ; > > :creator_url = "http://rsg.pml.ac.uk" ; > > :references = "See NEODAAS webpages at http://www.neodaas.ac.uk/ or RSG > > pages at http://rsg.pml.ac.uk/" ; > > :Metadata_Conventions = "Unidata Dataset Discovery v1.0" ; > > :keywords = "satellite,observation,ocean" ; > > :summary = "This data is Level-3 satellite observation data (Level 3 > > meaning raw observations processedto geophysical quantities, and placed > > onto a regular grid)." ; > > :id = > > "M2010001.1235.uk.postproc_products.MYO.01jan101235.v1.20111530951.data.nc" > > ; > > :naming_authority = "uk.ac.pml" ; > > :geospatial_lat_max = 62.999108 ; > > :title = "Level-3 satellite data from Moderate Resolution Imaging > > Spectroradiometer sensor" ; > > :source = "Moderate Resolution Imaging Spectroradiometer" ; > > :northernmost_latitude = 62.999108 ; > > :creator_name = "Plymouth Marine Laboratory Remote Sensing Group" ; > > :processing_level = "Level-3 (NASA EOS Conventions)" ; > > :creator_email = "address@hidden" ; > > :netcdf_library_version = "4.0.1 of Sep 3 2010 11:27:29 $" ; > > :date_issued = "Thu Jun 02 10:51:37 2011" ; > > :geospatial_lat_min = 47. ; > > :date_created = "Thu Jun 02 10:51:37 2011" ; > > :institution = "Plymouth Marine Laboratory Remote Sensing Group" ; > > :geospatial_lon_max = 13. ; > > :geospatial_lon_min = -15. ; > > :contact1 = "email: address@hidden" ; > > :license = "If you use this data towards any publication, please > > acknowledge this using: \'The authors thank the NERC Earth Observation Data > > Acquisition and Analysis Service (NEODAAS) for supplying data for this > > study\' and then email NEODAAS (address@hidden) with the details. The > > service relies on users\' publications as one measure of success." ; > > :Conventions = "CF-1.4" ; > > :project = "NEODAAS (NERC Earth Observation Data Acquisition and Analysis > > Service)" ; > > :cdm_data_type = "Grid" ; > > :RSG_sensor = "MODIS" ; > > :westernmost_longitude = -15. ; > > :RSG_areacode = "uk" ; > > :southernmost_latitude = 47. ; > > :netcdf_file_type = "NETCDF4_CLASSIC" ; > > :history = "Created during RSG Standard Mapping (Mapper) [SGE Job Number: > > 2577153]" ; > > :NCO = "4.0.7" ; > > } > > [mpayne@oleander compiler]$ > > > > > > Russ Rew UCAR Unidata Program > address@hidden http://www.unidata.ucar.edu > > > > Ticket Details > =================== > Ticket ID: CUV-251255 > Department: Support netCDF > Priority: Normal > Status: Closed > > > Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: CUV-251255 Department: Support netCDF Priority: Normal Status: Closed