[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #TIR-820282]: NetCDF-4 Parallel independent access with unlimited dimension (Fortran 90)
- Subject: [netCDF #TIR-820282]: NetCDF-4 Parallel independent access with unlimited dimension (Fortran 90)
- Date: Fri, 12 Apr 2013 09:59:41 -0600
Reto,
> I've tried the following configuration
> - hdf5 1.8.11-snap16
> - netcdf-4.3.0-rc4
> - netcdf-fortran-4.2
> - openmpi-1.6.3
> - gcc/gfortran 4.6.3
>
> Same issue. If I let all processes do the write, then it works fine. If I for
> instance exclude process #0,1,2 or 3 from the writing, then the write hangs
> (all metadata/open/close is collective, only the write is independent.). It
> seems to me that somehow on my system all writes are collective by default
> and thus the write operation is not executed as independent.
>
> Do you have a configuration with openmpi on OSX somewhere around?
Yes, I had to deactivate my mpich configuration first, but now have openmpi
1.6.4 on
OSX 10.8.3. However, when I try to build hdf5 1.8.11-pre1 with it, using
CC=/opt/local/lib/openmpi/bin/mpicc ./configure
make
make check
Some tests fail in "make check", for example testing "ph5diff
h5diff_basiccl.h5", that
may be due to not having a POSIX-compliant parallel file system installed.
Also I
jut noticed that the earlier test t_posix_compliant test for
allwrite_allread_blocks
with POSIX IO failed, though it returned 0 so as not to stop the hdf5 testing.
Are you using a parallel file system? Do you set the environment variable
HDF5_PARAPREFIX to a directory in a parallel file system? What file system are
you
using for your parallel I/O tests?
I'm afraid I don't know much about parallel I/O, and the netCDF parallel I/O
expert
got lured away to a different job some time ago, so we may need some help or
pointers
where to look to install a parallel file system on our OS X platform for this
kind of
testing and debugging.
> I will start putting some debugging commands into the netcdf-fortran library
> and see where the process really hangs and whether the collective/independent
> write is executed correctly.
Thanks, that would be helpful ...
--Russ
> Reto
>
>
> On Apr 9, 2013, at 11:01 PM, Unidata netCDF Support wrote:
>
> > Hi Reto,
> >
> > Sorry to have taken so long to respond to your question.
> >> I have been using NetCDF-4 Parallel I/O with the Fortran 90 interface for
> >> some time with success. Thank you for this great tool!
> >>
> >> However, I now have an issue with independent access:
> >>
> >> - NetCDF F90 Parallel access (NetCDF-4, MPIIO)
> >> - 3 fixed and 1 unlimited dimension
> >> - alle processes open/close the file and write metadata
> >> - only a few processes write to the file (-> independent access)
> >> - the write hangs. It works fine if all processes take place.
> >>
> >> I've changed your example F90 parallel I/O file simple_xy_par_wr.f90 to
> >> include a unlimited dimension and independent access of only a subset of
> >> processes. Same issue. Even if I explicitly set the access type to
> >> independent for the variable. Can you reproduce the issue on your side?
> >>
> >> The following system configuration on my side:
> >> - NetCDF 4.2.1.1 and F90 interface 4.2
> >> - hdf5 1.8.9
> >> - Openmpi 1.
> >> - OSX, gcc 4.6.3
> >
> > No, I haven't been able to reproduce the issue, but I can't exactly
> > duplicate
> > your configuration easily, and there have been some updates and bug fixes
> > that
> > may have made a difference.
> >
> > First I tried this configuration, which worked fine on your attached
> > example:
> >
> > - NetCDF 4.3.0-rc4 and F90 interface 4.2
> > - hdf5 1.8.11 (release candidate from svn repository)
> > - mpich2-1.3.1
> > - Linux Fedora, mpicc, mpif90 wrapping gcc, gfortran 4.5.1
> >
> > So if you can build those versions, it should work for you. I'm not sure
> > whether
> > the fix is in netCDF-4.3.0 or in hdf5-1.8.11, but both have a fix for at
> > least one
> > parallel I/O hanging process issue:
> >
> > https://bugtracking.unidata.ucar.edu/browse/NCF-214 (fix in netCDF-4.3.0)
> > https://bugtracking.unidata.ucar.edu/browse/NCF-240 (fix in HDF5-1.8.11)
> >
> > --Russ
> >
> > Russ Rew UCAR Unidata Program
> > address@hidden http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: TIR-820282
> > Department: Support netCDF
> > Priority: High
> > Status: Closed
> >
>
>
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: TIR-820282
Department: Support netCDF
Priority: High
Status: Closed