[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #KLB-596506]: apparent bug in netcdf-4.2
- Subject: [netCDF #KLB-596506]: apparent bug in netcdf-4.2
- Date: Thu, 28 Feb 2013 23:28:22 -0700
Jim,
> I have created a test program that appears to demonstrate a serious bug in
> netcdf-4.2. The program opens an existing netcdf file, reads a variable
> and closes the file.
> It prints the min and max val and sum of the variable.
>
> Then the program opens the file to update, goes into define mode using
> nf_redef, leaves define mode
> and closes the file.
>
> Then I reopen the file and read and print the same variable as above, the
> values of min, max and sum should be the same as
> before, but they are not - indicating that the file has been corrupted.
>
> The input file used in the test is rather large 1.4GB so I have put the
> testcase on yellowstone in directory
> /glade/p/work/jedwards/nfbug
> I've made a few attempts to reduce the size of the file, but this causes
> the error to go away.
>
> I have also put the test in the Makefile so you only need to run gmake to
> execute it.
I created a version of your code that just uses the C API and the bug still
occurs,
so it's not in the Fortran API.
I converted your original 64-bit-offset format file to a classic model format
file, and
the bug does not occur, so apparently it's in the code implementing the 64-bit
offset
format, first introduced in December 2004, with version 3.6.0.
Surprisingly, the bug occurs in netCDF version 3.6.0 and every subsequent
version, up
to and including the current 4.3 release candidate, so it's been there for over
8 years.
You're the first to report it, but I hope it hasn't corrupted other users'
64-bit offset files
who didn't notice. Perhaps the sequence of open, redef, enddef, close calls
with no
changes to the header between the redef and enddef call is an uncommon enough
pattern that the problem is rare, but it certainly is a major bug, since it
occurs with no
indication that the file is corrupted until the wrong values are later read.
Unfortunately, the valgrind tool is of no help, as it doesn't indicate any
detectable
memory access problems. I'm digging into it with the gdb debugger to try to
understand
the bug and fix it. I've created a Jira ticket if you want to follow progress:
http://bugtracking.unidata.ucar.edu/browse/NCF-234
If you see any other conditions under which the bug occurs, I'd be very
interested.
--Russ
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: KLB-596506
Department: Support netCDF
Priority: Normal
Status: Closed