This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Reto, I've finally created a Jira ticket for this issue, in case you want to follow its status: https://bugtracking.unidata.ucar.edu/browse/NCF-250 --Russ > Russ, > > So, I've now also recompiled the whole NetCDF/HDF5 suite with MPICH 3.0.3 > instead of Openmpi. Same story. > > I've traced down the blocking statement to the HDF5 library called from the > netcdf library during nc_put_vara_int: > > In nc4hdf.c (around line 770) it is calling the H5D.c routine H5Dset_extent: > if (H5Dset_extent(var->hdf_datasetid, xtend_size) < 0) > BAIL(NC_EHDFERR); > > This is where write processes wait during an independent write operation > involving 1 unlimited dimension (where the dataset extent needs to be > extended) when not all processes take place of the write operation. > > Reto > > > On Apr 12, 2013, at 7:32 PM, Unidata netCDF Support wrote: > > > Reto, > > > >> Yes, the POSIX parallel I/O tests fail on OSX with OpenMPI, but that is > >> fine. OSX and OpenMPI uses MPIIO. So to my understanding the parallel > >> tests are ok if either POSIX or MPIIO work and the other one fails. > >> > >> I am actually not using a parallel file system on OSX. I use the regular > >> file system (basic OSX installation) and I think that the parallel I/O has > >> to work in collective and independent mode even when using a regular file > >> system. > > > > I'm curious how you installed parallel HDF5, because my "make check" fails > > before finishing > > the tests. Did you build HDF5 without --enable-parallel, or without using > > CC=mpicc? Or did > > you build it with parallel I/O, but run "make install" even though "make > > check" failed as a > > result of not having a parallel file system? > > > > --Russ > > > >> I will test the same installation on Linux and then start debugging on > >> OSX, and maybe we find out something. > >> > >> Btw. the netcdf-fortran 4.4 beta failed to compile alltogether on OSX, so > >> I'm still using netcdf-fortran 4.2. > >> > >> Have a great weekend, > >> > >> Reto > >> > >> > >> On Apr 12, 2013, at 5:59 PM, Unidata netCDF Support wrote: > >> > >>> Reto, > >>> > >>>> I've tried the following configuration > >>>> - hdf5 1.8.11-snap16 > >>>> - netcdf-4.3.0-rc4 > >>>> - netcdf-fortran-4.2 > >>>> - openmpi-1.6.3 > >>>> - gcc/gfortran 4.6.3 > >>>> > >>>> Same issue. If I let all processes do the write, then it works fine. If > >>>> I for instance exclude process #0,1,2 or 3 from the writing, then the > >>>> write hangs (all metadata/open/close is collective, only the write is > >>>> independent.). It seems to me that somehow on my system all writes are > >>>> collective by default and thus the write operation is not executed as > >>>> independent. > >>>> > >>>> Do you have a configuration with openmpi on OSX somewhere around? > >>> > >>> Yes, I had to deactivate my mpich configuration first, but now have > >>> openmpi 1.6.4 on > >>> OSX 10.8.3. However, when I try to build hdf5 1.8.11-pre1 with it, using > >>> > >>> CC=/opt/local/lib/openmpi/bin/mpicc ./configure > >>> make > >>> make check > >>> > >>> Some tests fail in "make check", for example testing "ph5diff > >>> h5diff_basiccl.h5", that > >>> may be due to not having a POSIX-compliant parallel file system > >>> installed. Also I > >>> jut noticed that the earlier test t_posix_compliant test for > >>> allwrite_allread_blocks > >>> with POSIX IO failed, though it returned 0 so as not to stop the hdf5 > >>> testing. > >>> > >>> > >>> Are you using a parallel file system? Do you set the environment variable > >>> HDF5_PARAPREFIX to a directory in a parallel file system? What file > >>> system are you > >>> using for your parallel I/O tests? > >>> > >>> I'm afraid I don't know much about parallel I/O, and the netCDF parallel > >>> I/O expert > >>> got lured away to a different job some time ago, so we may need some help > >>> or pointers > >>> where to look to install a parallel file system on our OS X platform for > >>> this kind of > >>> testing and debugging. > >>> > >>>> I will start putting some debugging commands into the netcdf-fortran > >>>> library and see where the process really hangs and whether the > >>>> collective/independent write is executed correctly. > >>> > >>> Thanks, that would be helpful ... > >>> > >>> --Russ > >>> > >>>> Reto > >>>> > >>>> > >>>> On Apr 9, 2013, at 11:01 PM, Unidata netCDF Support wrote: > >>>> > >>>>> Hi Reto, > >>>>> > >>>>> Sorry to have taken so long to respond to your question. > >>>>>> I have been using NetCDF-4 Parallel I/O with the Fortran 90 interface > >>>>>> for some time with success. Thank you for this great tool! > >>>>>> > >>>>>> However, I now have an issue with independent access: > >>>>>> > >>>>>> - NetCDF F90 Parallel access (NetCDF-4, MPIIO) > >>>>>> - 3 fixed and 1 unlimited dimension > >>>>>> - alle processes open/close the file and write metadata > >>>>>> - only a few processes write to the file (-> independent access) > >>>>>> - the write hangs. It works fine if all processes take place. > >>>>>> > >>>>>> I've changed your example F90 parallel I/O file simple_xy_par_wr.f90 > >>>>>> to include a unlimited dimension and independent access of only a > >>>>>> subset of processes. Same issue. Even if I explicitly set the access > >>>>>> type to independent for the variable. Can you reproduce the issue on > >>>>>> your side? > >>>>>> > >>>>>> The following system configuration on my side: > >>>>>> - NetCDF 4.2.1.1 and F90 interface 4.2 > >>>>>> - hdf5 1.8.9 > >>>>>> - Openmpi 1. > >>>>>> - OSX, gcc 4.6.3 > >>>>> > >>>>> No, I haven't been able to reproduce the issue, but I can't exactly > >>>>> duplicate > >>>>> your configuration easily, and there have been some updates and bug > >>>>> fixes that > >>>>> may have made a difference. > >>>>> > >>>>> First I tried this configuration, which worked fine on your attached > >>>>> example: > >>>>> > >>>>> - NetCDF 4.3.0-rc4 and F90 interface 4.2 > >>>>> - hdf5 1.8.11 (release candidate from svn repository) > >>>>> - mpich2-1.3.1 > >>>>> - Linux Fedora, mpicc, mpif90 wrapping gcc, gfortran 4.5.1 > >>>>> > >>>>> So if you can build those versions, it should work for you. I'm not > >>>>> sure whether > >>>>> the fix is in netCDF-4.3.0 or in hdf5-1.8.11, but both have a fix for > >>>>> at least one > >>>>> parallel I/O hanging process issue: > >>>>> > >>>>> https://bugtracking.unidata.ucar.edu/browse/NCF-214 (fix in > >>>>> netCDF-4.3.0) > >>>>> https://bugtracking.unidata.ucar.edu/browse/NCF-240 (fix in > >>>>> HDF5-1.8.11) > >>>>> > >>>>> --Russ > >>>>> > >>>>> Russ Rew UCAR Unidata Program > >>>>> address@hidden http://www.unidata.ucar.edu > >>>>> > >>>>> > >>>>> > >>>>> Ticket Details > >>>>> =================== > >>>>> Ticket ID: TIR-820282 > >>>>> Department: Support netCDF > >>>>> Priority: High > >>>>> Status: Closed > >>>>> > >>>> > >>>> > >>> > >>> Russ Rew UCAR Unidata Program > >>> address@hidden http://www.unidata.ucar.edu > >>> > >>> > >>> > >>> Ticket Details > >>> =================== > >>> Ticket ID: TIR-820282 > >>> Department: Support netCDF > >>> Priority: High > >>> Status: Closed > >>> > >> > >> > > > > Russ Rew UCAR Unidata Program > > address@hidden http://www.unidata.ucar.edu > > > > > > > > Ticket Details > > =================== > > Ticket ID: TIR-820282 > > Department: Support netCDF > > Priority: High > > Status: Closed > > > > Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: TIR-820282 Department: Support netCDF Priority: Emergency Status: Closed