[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #TIR-820282]: NetCDF-4 Parallel independent access with unlimited dimension (Fortran 90)
- Subject: [netCDF #TIR-820282]: NetCDF-4 Parallel independent access with unlimited dimension (Fortran 90)
- Date: Fri, 12 Apr 2013 11:32:17 -0600
Reto,
> Yes, the POSIX parallel I/O tests fail on OSX with OpenMPI, but that is fine.
> OSX and OpenMPI uses MPIIO. So to my understanding the parallel tests are ok
> if either POSIX or MPIIO work and the other one fails.
>
> I am actually not using a parallel file system on OSX. I use the regular file
> system (basic OSX installation) and I think that the parallel I/O has to work
> in collective and independent mode even when using a regular file system.
I'm curious how you installed parallel HDF5, because my "make check" fails
before finishing
the tests. Did you build HDF5 without --enable-parallel, or without using
CC=mpicc? Or did
you build it with parallel I/O, but run "make install" even though "make check"
failed as a
result of not having a parallel file system?
--Russ
> I will test the same installation on Linux and then start debugging on OSX,
> and maybe we find out something.
>
> Btw. the netcdf-fortran 4.4 beta failed to compile alltogether on OSX, so I'm
> still using netcdf-fortran 4.2.
>
> Have a great weekend,
>
> Reto
>
>
> On Apr 12, 2013, at 5:59 PM, Unidata netCDF Support wrote:
>
> > Reto,
> >
> >> I've tried the following configuration
> >> - hdf5 1.8.11-snap16
> >> - netcdf-4.3.0-rc4
> >> - netcdf-fortran-4.2
> >> - openmpi-1.6.3
> >> - gcc/gfortran 4.6.3
> >>
> >> Same issue. If I let all processes do the write, then it works fine. If I
> >> for instance exclude process #0,1,2 or 3 from the writing, then the write
> >> hangs (all metadata/open/close is collective, only the write is
> >> independent.). It seems to me that somehow on my system all writes are
> >> collective by default and thus the write operation is not executed as
> >> independent.
> >>
> >> Do you have a configuration with openmpi on OSX somewhere around?
> >
> > Yes, I had to deactivate my mpich configuration first, but now have openmpi
> > 1.6.4 on
> > OSX 10.8.3. However, when I try to build hdf5 1.8.11-pre1 with it, using
> >
> > CC=/opt/local/lib/openmpi/bin/mpicc ./configure
> > make
> > make check
> >
> > Some tests fail in "make check", for example testing "ph5diff
> > h5diff_basiccl.h5", that
> > may be due to not having a POSIX-compliant parallel file system installed.
> > Also I
> > jut noticed that the earlier test t_posix_compliant test for
> > allwrite_allread_blocks
> > with POSIX IO failed, though it returned 0 so as not to stop the hdf5
> > testing.
> >
> >
> > Are you using a parallel file system? Do you set the environment variable
> > HDF5_PARAPREFIX to a directory in a parallel file system? What file system
> > are you
> > using for your parallel I/O tests?
> >
> > I'm afraid I don't know much about parallel I/O, and the netCDF parallel
> > I/O expert
> > got lured away to a different job some time ago, so we may need some help
> > or pointers
> > where to look to install a parallel file system on our OS X platform for
> > this kind of
> > testing and debugging.
> >
> >> I will start putting some debugging commands into the netcdf-fortran
> >> library and see where the process really hangs and whether the
> >> collective/independent write is executed correctly.
> >
> > Thanks, that would be helpful ...
> >
> > --Russ
> >
> >> Reto
> >>
> >>
> >> On Apr 9, 2013, at 11:01 PM, Unidata netCDF Support wrote:
> >>
> >>> Hi Reto,
> >>>
> >>> Sorry to have taken so long to respond to your question.
> >>>> I have been using NetCDF-4 Parallel I/O with the Fortran 90 interface
> >>>> for some time with success. Thank you for this great tool!
> >>>>
> >>>> However, I now have an issue with independent access:
> >>>>
> >>>> - NetCDF F90 Parallel access (NetCDF-4, MPIIO)
> >>>> - 3 fixed and 1 unlimited dimension
> >>>> - alle processes open/close the file and write metadata
> >>>> - only a few processes write to the file (-> independent access)
> >>>> - the write hangs. It works fine if all processes take place.
> >>>>
> >>>> I've changed your example F90 parallel I/O file simple_xy_par_wr.f90 to
> >>>> include a unlimited dimension and independent access of only a subset of
> >>>> processes. Same issue. Even if I explicitly set the access type to
> >>>> independent for the variable. Can you reproduce the issue on your side?
> >>>>
> >>>> The following system configuration on my side:
> >>>> - NetCDF 4.2.1.1 and F90 interface 4.2
> >>>> - hdf5 1.8.9
> >>>> - Openmpi 1.
> >>>> - OSX, gcc 4.6.3
> >>>
> >>> No, I haven't been able to reproduce the issue, but I can't exactly
> >>> duplicate
> >>> your configuration easily, and there have been some updates and bug fixes
> >>> that
> >>> may have made a difference.
> >>>
> >>> First I tried this configuration, which worked fine on your attached
> >>> example:
> >>>
> >>> - NetCDF 4.3.0-rc4 and F90 interface 4.2
> >>> - hdf5 1.8.11 (release candidate from svn repository)
> >>> - mpich2-1.3.1
> >>> - Linux Fedora, mpicc, mpif90 wrapping gcc, gfortran 4.5.1
> >>>
> >>> So if you can build those versions, it should work for you. I'm not sure
> >>> whether
> >>> the fix is in netCDF-4.3.0 or in hdf5-1.8.11, but both have a fix for at
> >>> least one
> >>> parallel I/O hanging process issue:
> >>>
> >>> https://bugtracking.unidata.ucar.edu/browse/NCF-214 (fix in netCDF-4.3.0)
> >>> https://bugtracking.unidata.ucar.edu/browse/NCF-240 (fix in HDF5-1.8.11)
> >>>
> >>> --Russ
> >>>
> >>> Russ Rew UCAR Unidata Program
> >>> address@hidden http://www.unidata.ucar.edu
> >>>
> >>>
> >>>
> >>> Ticket Details
> >>> ===================
> >>> Ticket ID: TIR-820282
> >>> Department: Support netCDF
> >>> Priority: High
> >>> Status: Closed
> >>>
> >>
> >>
> >
> > Russ Rew UCAR Unidata Program
> > address@hidden http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: TIR-820282
> > Department: Support netCDF
> > Priority: High
> > Status: Closed
> >
>
>
Russ Rew UCAR Unidata Program
address@hidden http://www.unidata.ucar.edu
Ticket Details
===================
Ticket ID: TIR-820282
Department: Support netCDF
Priority: High
Status: Closed