[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #BKA-178769]: Parallel hang in nc_put_vara_double
- Subject: [netCDF #BKA-178769]: Parallel hang in nc_put_vara_double
- Date: Tue, 11 Dec 2012 14:32:02 -0700
Hi Greg,
> Additional information on this ticket:
>
> I am able to get code to run by making a couple changes to nc4hdf.c.
>
> Near line 603, change the "if (start[d2] >= (hssize_t)fdims[d2])" to "if
> (start[d2] > (hssize_t)fdims[d2])" That is, change ">=" to ">".
>
> Near line 612, remove the code block below:
>
> /* A little quirk: if any of the count values are zero, then
> return success and forget about it. */
> for (d2 = 0; d2 < var->ndims; d2++)
> if (count[d2] == 0)
> goto exit;
>
> The line numbers are for netcdf-4.1.3; I get similar behavior in
> netcdf-4.2.1.1. I'm not sure how those changes affect other behavior in
> serial runs, but they get me past the current hang...
>
> --Greg
Unfortunately, in either netcdf-4.1.3 or in the current snapshot, making the 2
changes you suggest results in failures running "make check" in nc_test when
configuring serial:
*** testing nc_put_var1_text ...
FAILURE at line 1465 of test_put.c: bad index: status = -57
...
### 48 FAILURES TESTING nc_put_var1_text! ###
...
*** testing nc_put_vara ...
FAILURE at line 854 of test_write.c: bad index: status = -57
### 284 FAILURES TESTING nc_put_vara! ###
...
*** Total number of failures: 5668
*** nc_test FAILURE!!!
FAIL: nc_test
Removing just the "quirky" code block doesn't cause any test failures, but
I assume by itself that's not a fix for the problem you encountered with
parallel netCDF-4, so I would need a better justification to accept that fix.
We may have some help replacing our lack of parallel I/O expertise soon, but
until then I'll put this on hold, at least until we get upgrade our parallel
testing environment to the latest HDF5, pnetCDF, MPI-IO, etc.
By reporting the problem and your workaround, it may help someone else
searching for a solution to a similar problem.
I see you've also provided a patch for additional problems with pnetcdf, and
I'll
try testing those soon for incorporation into our development snapshot. Thanks
for your contributions!
--Russ
> On 11/29/12 3:48 PM, Unidata netCDF Support wrote:
> > Gregory Sjaardema,
> >
> > Your Ticket has been received, and a Unidata staff member will review it
> > and reply accordingly. Listed below are details of this new Ticket. Please
> > make sure the Ticket ID remains in the Subject: line on all correspondence
> > related to this Ticket.
> >
> > Ticket ID: BKA-178769
> > Subject: Parallel hang in nc_put_vara_double
> > Department: Support netCDF
> > Priority: Normal
> > Status: Open
>
>
>
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: BKA-178769
Department: Support netCDF
Priority: High
Status: Closed