[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #CUV-251255]: Nccopy extremly slow / hangs



> Just to give you an example of what its doing, I logged the file size for the 
> command below as a function of time. see attached figure. It continues at 
> this low speed until I grow impatient.... :-)

That doesn't look like O(n**2) behavior for a bad algorithm, it looks like 
thrashing
behavior when close to resource exhaustion (in this case memory).  I'll be 
interested
if you can reproduce the examples I tried with "nccopy -m 1G -h 2G ...", which 
should
use enough less than your 8 GB machine has to not incur thrashing, unless your 
machine
is doing lots of other processing ...

--Russ

> 
> Hi Mark,
> 
> > After a holiday and a break from this work, I was finally able to have a 
> > look at it again. Unfortunately, the fix doesn't seem to work for me :-( It 
> > is still the same problem as previously - the copy starts out fine, but 
> > gets progressively slower and slower, and ultimately "hangs". Here is the 
> > command that I am using:
> >
> > ./nccopy -u -k3 -d1 -m 2G -h 18G -e 10001 -c 
> > time/1698,longitude/6,latitude/7 combined_snapshots.nc 
> > temporal_read_optimised.nc
> >
> > I am wondering whether I am setting the -h and -e options correctly? How 
> > should these be set? I'm not sure I understand the difference between them.
> 
> It looks to me like you are setting -h correctly.  Since you are using 
> smaller sizes for the longitude and latitude dimensions than I
> used in my tests (6 and 7 versus 24 and 25), you will have about 14.3 times 
> as many chunks as I used (I was aiming at each chunk
> being about 4 MB), so you could set the number of elements in the chunk cache 
> higher (61446 instead of 4301).  I had used 10001
> elements in the chunk cache to be generously larger than 4301, but I think 
> it's not too critical as long as the number of elements
> in the chunk cache is larger than the number of cache elements you need.  
> Since you are compressing the data and reordering it in
> a way that requires *all* the chunks in memory at once, you need to use at 
> least "-e 61446", and to be generous should probably
> use something like "-e 61500".  The HDF5 documentation recommends that the 
> number of elements in the chunk cache should be
> prime, but I don't see the necessity for that and haven't noticed any 
> difference whether it's prime or composite.  With the current
> setting of "-e 1001", chunks that are only partly written will have to be 
> ejected from the cache to make room for new chunks, and
> this will lead to lots of unnecessary recompressing of chunks that are 
> ejected before writing them to disk, as well as uncompressing
> partially written chunks when reading them into the chunk cache.
> 
> You also need to make sure that your computer has enough memory to hold the 
> chunk cache in memory.  You've specified a 2GB
> input buffer and 18GB of chunk cache memory, so you should have at least 20GB 
> of memory for nccopy to run,  keeping
> the data in the chunk cache uncompressed while reordering it.  You might get 
> by with a smaller input buffer, say 11MB (one time of
> 1617*1596*4 bytes) and a somewhat smaller chunk cache, "-h 17.53G", if you're 
> close to the maximum.
> 
> > The combined_snapshots.nc file is 630MB - a dump of the header is given 
> > below:
> 
> My tests have been with simulated data of the same size as you're using, but 
> my simulated data may compress better than yours.
> If you could possibly make your actual combined_snapshots.nc file available 
> somewhere for me to test nccopy on the actual data,
> I could make sure I can reproduce something like the 15 minute times I'm 
> seeing for the copy and rechunking.  It may be your
> use of 1698x7x6 chunks requires more time than the larger 1698x25x24 chunks I 
> was writing, so I could try that as well.
> 
> > Any ideas?
> 
> I really can't explain what looks like the O(n**2) behavior you seem to be 
> seeing in writing the output, unless it's something in
> the HDF5 layer involving a performance bug in the B-trees that index the 
> chunks.  You can't really judge the progress in writing
> the output file by the size of the output, as none of the chunks are complete 
> until the end of the copy.  So the output file should
> stay fairly small until all of the chunks are flushed to disk (while being 
> compressed) at the end of the rechunking.
> 
> Also the -h and -e options to nccopy have only been minimally tested, and 
> there could still be bugs ...
> 
> --Russ
> 
> > [mpayne@oleander compiler]$ ncdump combined_snapshots.nc -h -c
> > netcdf combined_snapshots {
> > dimensions:
> > latitude = 1617 ;
> > longitude = 1596 ;
> > time = UNLIMITED ; // (1698 currently)
> > variables:
> > float chl_oc5(time, latitude, longitude) ;
> > chl_oc5:_FillValue = 0.f ;
> > chl_oc5:long_name = "Chlorophyll-a concentration in sea water using the OC5 
> > algorithm" ;
> > chl_oc5:standard_name = "mass_concentration_of_chlorophyll_a_in_sea_water" ;
> > chl_oc5:grid_mapping = "mercator" ;
> > chl_oc5:units = "milligram m-3" ;
> > chl_oc5:missing_value = 0.f ;
> > chl_oc5:units_nonstandard = "mg m^-3" ;
> > float latitude(latitude) ;
> > latitude:_FillValue = -999.f ;
> > latitude:standard_name = "latitude" ;
> > latitude:long_name = "latitude" ;
> > latitude:valid_min = -90. ;
> > latitude:units = "degrees_north" ;
> > latitude:valid_max = 90. ;
> > latitude:axis = "Y" ;
> > float longitude(longitude) ;
> > longitude:_FillValue = -999.f ;
> > longitude:standard_name = "longitude" ;
> > longitude:long_name = "longitude" ;
> > longitude:valid_min = -180. ;
> > longitude:units = "degrees_east" ;
> > longitude:valid_max = 180. ;
> > longitude:axis = "X" ;
> > int mercator ;
> > mercator:false_easting = 0L ;
> > mercator:standard_parallel = 0L ;
> > mercator:grid_mapping_name = "mercator" ;
> > mercator:false_northing = 0L ;
> > mercator:longitude_of_projection_origin = 0L ;
> > double time(time) ;
> > time:_FillValue = -1. ;
> > time:time_origin = "1970-01-01 00:00:00" ;
> > time:valid_min = 0. ;
> > time:long_name = "time" ;
> > time:standard_name = "time" ;
> > time:units = "seconds since 1970-01-01 00:00:00" ;
> > time:calendar = "gregorian" ;
> > time:axis = "T" ;
> >
> > // global attributes:
> > :site_name = "UK Shelf Seas" ;
> > :citation = "If you use this data towards any publication, please 
> > acknowledge this using: \'The authors thank the NERC Earth Observation Data 
> > Acquisition and Analysis Service (NEODAAS) for supplying data for this 
> > study\' and then email NEODAAS (address@hidden) with the details. The 
> > service relies on users\' publications as one measure of success." ;
> > :creation_date = "Thu Jun 02 10:51:37 2011" ;
> > :easternmost_longitude = 13. ;
> > :creator_url = "http://rsg.pml.ac.uk"; ;
> > :references = "See NEODAAS webpages at http://www.neodaas.ac.uk/ or RSG 
> > pages at http://rsg.pml.ac.uk/"; ;
> > :Metadata_Conventions = "Unidata Dataset Discovery v1.0" ;
> > :keywords = "satellite,observation,ocean" ;
> > :summary = "This data is Level-3 satellite observation data (Level 3 
> > meaning raw observations processedto geophysical quantities, and placed 
> > onto a regular grid)." ;
> > :id = 
> > "M2010001.1235.uk.postproc_products.MYO.01jan101235.v1.20111530951.data.nc" 
> > ;
> > :naming_authority = "uk.ac.pml" ;
> > :geospatial_lat_max = 62.999108 ;
> > :title = "Level-3 satellite data from Moderate Resolution Imaging 
> > Spectroradiometer sensor" ;
> > :source = "Moderate Resolution Imaging Spectroradiometer" ;
> > :northernmost_latitude = 62.999108 ;
> > :creator_name = "Plymouth Marine Laboratory Remote Sensing Group" ;
> > :processing_level = "Level-3 (NASA EOS Conventions)" ;
> > :creator_email = "address@hidden" ;
> > :netcdf_library_version = "4.0.1 of Sep  3 2010 11:27:29 $" ;
> > :date_issued = "Thu Jun 02 10:51:37 2011" ;
> > :geospatial_lat_min = 47. ;
> > :date_created = "Thu Jun 02 10:51:37 2011" ;
> > :institution = "Plymouth Marine Laboratory Remote Sensing Group" ;
> > :geospatial_lon_max = 13. ;
> > :geospatial_lon_min = -15. ;
> > :contact1 = "email: address@hidden" ;
> > :license = "If you use this data towards any publication, please 
> > acknowledge this using: \'The authors thank the NERC Earth Observation Data 
> > Acquisition and Analysis Service (NEODAAS) for supplying data for this 
> > study\' and then email NEODAAS (address@hidden) with the details. The 
> > service relies on users\' publications as one measure of success." ;
> > :Conventions = "CF-1.4" ;
> > :project = "NEODAAS (NERC Earth Observation Data Acquisition and Analysis 
> > Service)" ;
> > :cdm_data_type = "Grid" ;
> > :RSG_sensor = "MODIS" ;
> > :westernmost_longitude = -15. ;
> > :RSG_areacode = "uk" ;
> > :southernmost_latitude = 47. ;
> > :netcdf_file_type = "NETCDF4_CLASSIC" ;
> > :history = "Created during RSG Standard Mapping (Mapper) [SGE Job Number: 
> > 2577153]" ;
> > :NCO = "4.0.7" ;
> > }
> > [mpayne@oleander compiler]$
> >
> >
> 
> Russ Rew                                         UCAR Unidata Program
> address@hidden                      http://www.unidata.ucar.edu
> 
> 
> 
> Ticket Details
> ===================
> Ticket ID: CUV-251255
> Department: Support netCDF
> Priority: Normal
> Status: Closed
> 
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: CUV-251255
Department: Support netCDF
Priority: Normal
Status: Closed