[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #PKT-462504]: Re: [netcdfgroup] netcdf-4 open/close memory leak
- Subject: [netCDF #PKT-462504]: Re: [netcdfgroup] netcdf-4 open/close memory leak
- Date: Mon, 25 Jan 2010 17:14:59 -0700
> Howdy, Ed et al.
>
> I got snapshot2010011908 and reran some tests, which unfortunately
> don't show much change. I'll start with the simple open/close memory
> issue that Jeff Whitaker brought up. The code was short enough, to
> which I added the get_mem_used calls:
>
Howdy Ted!
I have added this test to libsrc4/tst_files2.c, which is only built with
--enable-benchmarks, or "make tst_files2". (But you must wait for tomorrow
morning's snapshot to see this.)
But I get different results from you.
Furthermore, I have come up with three different get_mem_used functions. All
should be correct, as far as I can tell, but all give different answers. Sigh.
> [I mangled the name of get_mem_used_ to use it with fortran.]
>
> With a netcdf3 file, the output is (last column is net change per
> iteration)
>
> start: memuse= 440 440 440
> memuse,delta= 452 1568 1576 1116 1124
> memuse,delta= 1576 1584 1576 8 0
> memuse,delta= 1576 1584 1576 8 0
> memuse,delta= 1576 1584 1576 8 0
> memuse,delta= 1576 1584 1576 8 0
> memuse,delta= 1576 1584 1576 8 0
>
Isn't netCDF classic so nice and well-behaved? ;-)
What input file are you using? The ones created by tst_files2.c?
>
> But with a netcdf4 file:
>
> start: memuse= 440 440 440
> memuse,delta= 452 2804 2316 2352 1864
> memuse,delta= 2316 2852 2320 536 4
> memuse,delta= 2320 2856 2324 536 4
> memuse,delta= 2324 2860 2328 536 4
> memuse,delta= 2328 2864 2332 536 4
> memuse,delta= 2332 2868 2336 536 4
> memuse,delta= 2336 2868 2336 532 0
> memuse,delta= 2336 2872 2340 536 4
> memuse,delta= 2340 2876 2344 536 4
>
> Oddly, there is an occasional 0 increase for netcdf4, and I can't tell
> if there is a pattern to it. But in general there is a net 4 kB
> increase in memory use for just opening and closing a file, so I guess
> something is not being freed. I assume the initial jump from 440 to
> 1568 (or 2804) has to do with opening the first file.
But I *know* everything is being freed. Because valgrind would tell me. So what
is happening here is perhaps that HDF5 is allocating memory, and not freeing it
until the library exits. This would produce the problem we are seeing. But I
don't think they would be crazy enough to do this. I have sent them an email
asking about it.
I know what they will ask in turn: that you upgrade to their latest snapshot
too, and also make sure that you build HDF5 with --enable-using-memchecker.
>
> Now, on to the other memory problem I have:
>
> I managed to hack the get_mem_used into my Fortran90 code to combine
> data from multiple files, and it seems to shed light on what is going
> on. I have two methods reading the data: 1) Use netcdf4 to read each
> file variable at a given time and 2) use the HDF5 interface to read
> the data from each file. Both cases are using the fortran interfaces
> and both use the netCDF4 interface to write out the combined data.
>
> The individual files have chunk sizes for the 3D variables that are
> equal to the spatial dimensions x,y,z (40x60x80) with time being the
> unlimited dimension.
Did you set these chunksizes, or these are the default? And what chunk size are
you using for the unlimited dimension? (The default is 1).
>
> So, when reading a variable from 64 files, the total chunk cache would
> be 40x60x80 gridpoints x 4 bytes per point x 64 files = 48000 kB = 47
> MB, and this is almost exactly the average increase in memory that I
> see when using netcdf4 to read the files. (There is an additional 8
> MB per variable added by the NF90_DEF_VAR process. If all the input
> files are closed and reopened after reading a variable, about half of
> the memory (27 MB or so) gets freed up. The closing/opening process
> takes a lot time, too, much more than the time to read a variable.
Wait, how do you get that this is the size of the total chunk cache?
If you are using a recent snapshot of netCDF (and you need to get the very
latest - I just put one out this afternoon, and I think the automatic cache
sizing is working well now.)
Here's what happens with cache size: whatever you set for the file level cache
(with nc_set_chunk_cache) will be used by default for each variable in the
file. That is, if the file level cache is set to 100 MB, and you open a file
with 10 variables, it should consume 100 MB * 10 or 1 GB of memory for the file.
The default file-level cache size is now set to 4 MB is the current netCDF
setting. HDF5 uses 1 MB by default, but netCDF overrides that.
Now when a file is opened, the cache size is adjusted for each variable as the
file is opened. The chunk cache will be sized to hold 10 chunks from each var,
up to a 64 MB max for each variable. The per-var chunk cache can be modified
with nc_set_var_chunk_cache. (And Russ has wondered if calling that, with a
setting of zero, would change your results.)
So the cache size is a little bit of a complicated topic.
>
> For what it's worth, the chunk size for 3D variables in the combined
> file is reported by ncdump as 32, 173, 346 for 7482 kB (4-byte
> reals). The spatial dimensions are 60 x 319 x 640 (z,y,x).
>
> When I use direct HDF5 calls to read the data, on the other hand,
> there is hardly any (16kB) increase in memory usage (I open one file
> at a time, read the current variable, and close the dataset and file
> before going to the next file). Also, the total time is the same as
> for using netcdf4, so there seems to be no performance penalty for
> deallocating the dataset memory.
But I *know* that when you close a netCDF-4 file, I have closed all HDF5
objects in the file. Let me run another test to demonstrate that tomorrow...
>
> Now, I can understand leaving a read cache around if it might be used
> again, but here's the rub: when I read a second time level, the memory
> keeps going up by 47 MB for each variable, even though that variable
> was already read before. So it seems that the previously-allocated
> cache is not getting reused, but is being allocated again?
>
> My suggestion is to try assuming that a read cache will _not_ be used
> again, and just deallocate it as soon as the read task is finished.
> (Perhaps this would be in nc4hdf.c?) Based on my HDF5 read tests,
> there's no performance penalty from having to allocate the space
> again, so there seems to be little reason to keep it hanging around.
> Particularly since it can add up to a lot of space when reading from
> lots of files. I'd be willing to test it, too.
>
> Best,
Thanks, I will hit this again tomorrow morning.
Ticket Details
===================
Ticket ID: PKT-462504
Department: Support netCDF
Priority: Critical
Status: Open