This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
> Thank you very much. > > I have one last question concerning the fact why my example never caused > problems on your side: did you really run my example with 4 processes? thus > mpirun -n 4? No, sorry, I never saw any mention of using mpirun in your report on the problem. That would probably explain why it worked for me. I guess we should be using mpirun in our tests! --Russ > I had the same hang also on the NCAR yellowstone supercomputer with the > standard NetCDF 4.2 / HDF5 1.8.9 install they have there. > > Reto > > On Apr 26, 2013, at 11:34 PM, Unidata netCDF Support wrote: > > > Reto, > > > > I've finally created a Jira ticket for this issue, in case you want to > > follow > > its status: > > > > https://bugtracking.unidata.ucar.edu/browse/NCF-250 > > > > --Russ > > > >> Russ, > >> > >> So, I've now also recompiled the whole NetCDF/HDF5 suite with MPICH 3.0.3 > >> instead of Openmpi. Same story. > >> > >> I've traced down the blocking statement to the HDF5 library called from > >> the netcdf library during nc_put_vara_int: > >> > >> In nc4hdf.c (around line 770) it is calling the H5D.c routine > >> H5Dset_extent: > >> if (H5Dset_extent(var->hdf_datasetid, xtend_size) < 0) > >> BAIL(NC_EHDFERR); > >> > >> This is where write processes wait during an independent write operation > >> involving 1 unlimited dimension (where the dataset extent needs to be > >> extended) when not all processes take place of the write operation. > >> > >> Reto > >> > >> > >> On Apr 12, 2013, at 7:32 PM, Unidata netCDF Support wrote: > >> > >>> Reto, > >>> > >>>> Yes, the POSIX parallel I/O tests fail on OSX with OpenMPI, but that is > >>>> fine. OSX and OpenMPI uses MPIIO. So to my understanding the parallel > >>>> tests are ok if either POSIX or MPIIO work and the other one fails. > >>>> > >>>> I am actually not using a parallel file system on OSX. I use the regular > >>>> file system (basic OSX installation) and I think that the parallel I/O > >>>> has to work in collective and independent mode even when using a regular > >>>> file system. > >>> > >>> I'm curious how you installed parallel HDF5, because my "make check" > >>> fails before finishing > >>> the tests. Did you build HDF5 without --enable-parallel, or without > >>> using CC=mpicc? Or did > >>> you build it with parallel I/O, but run "make install" even though "make > >>> check" failed as a > >>> result of not having a parallel file system? > >>> > >>> --Russ > >>> > >>>> I will test the same installation on Linux and then start debugging on > >>>> OSX, and maybe we find out something. > >>>> > >>>> Btw. the netcdf-fortran 4.4 beta failed to compile alltogether on OSX, > >>>> so I'm still using netcdf-fortran 4.2. > >>>> > >>>> Have a great weekend, > >>>> > >>>> Reto > >>>> > >>>> > >>>> On Apr 12, 2013, at 5:59 PM, Unidata netCDF Support wrote: > >>>> > >>>>> Reto, > >>>>> > >>>>>> I've tried the following configuration > >>>>>> - hdf5 1.8.11-snap16 > >>>>>> - netcdf-4.3.0-rc4 > >>>>>> - netcdf-fortran-4.2 > >>>>>> - openmpi-1.6.3 > >>>>>> - gcc/gfortran 4.6.3 > >>>>>> > >>>>>> Same issue. If I let all processes do the write, then it works fine. > >>>>>> If I for instance exclude process #0,1,2 or 3 from the writing, then > >>>>>> the write hangs (all metadata/open/close is collective, only the write > >>>>>> is independent.). It seems to me that somehow on my system all writes > >>>>>> are collective by default and thus the write operation is not executed > >>>>>> as independent. > >>>>>> > >>>>>> Do you have a configuration with openmpi on OSX somewhere around? > >>>>> > >>>>> Yes, I had to deactivate my mpich configuration first, but now have > >>>>> openmpi 1.6.4 on > >>>>> OSX 10.8.3. However, when I try to build hdf5 1.8.11-pre1 with it, > >>>>> using > >>>>> > >>>>> CC=/opt/local/lib/openmpi/bin/mpicc ./configure > >>>>> make > >>>>> make check > >>>>> > >>>>> Some tests fail in "make check", for example testing "ph5diff > >>>>> h5diff_basiccl.h5", that > >>>>> may be due to not having a POSIX-compliant parallel file system > >>>>> installed. Also I > >>>>> jut noticed that the earlier test t_posix_compliant test for > >>>>> allwrite_allread_blocks > >>>>> with POSIX IO failed, though it returned 0 so as not to stop the hdf5 > >>>>> testing. > >>>>> > >>>>> > >>>>> Are you using a parallel file system? Do you set the environment > >>>>> variable > >>>>> HDF5_PARAPREFIX to a directory in a parallel file system? What file > >>>>> system are you > >>>>> using for your parallel I/O tests? > >>>>> > >>>>> I'm afraid I don't know much about parallel I/O, and the netCDF > >>>>> parallel I/O expert > >>>>> got lured away to a different job some time ago, so we may need some > >>>>> help or pointers > >>>>> where to look to install a parallel file system on our OS X platform > >>>>> for this kind of > >>>>> testing and debugging. > >>>>> > >>>>>> I will start putting some debugging commands into the netcdf-fortran > >>>>>> library and see where the process really hangs and whether the > >>>>>> collective/independent write is executed correctly. > >>>>> > >>>>> Thanks, that would be helpful ... > >>>>> > >>>>> --Russ > >>>>> > >>>>>> Reto > >>>>>> > >>>>>> > >>>>>> On Apr 9, 2013, at 11:01 PM, Unidata netCDF Support wrote: > >>>>>> > >>>>>>> Hi Reto, > >>>>>>> > >>>>>>> Sorry to have taken so long to respond to your question. > >>>>>>>> I have been using NetCDF-4 Parallel I/O with the Fortran 90 > >>>>>>>> interface for some time with success. Thank you for this great tool! > >>>>>>>> > >>>>>>>> However, I now have an issue with independent access: > >>>>>>>> > >>>>>>>> - NetCDF F90 Parallel access (NetCDF-4, MPIIO) > >>>>>>>> - 3 fixed and 1 unlimited dimension > >>>>>>>> - alle processes open/close the file and write metadata > >>>>>>>> - only a few processes write to the file (-> independent access) > >>>>>>>> - the write hangs. It works fine if all processes take place. > >>>>>>>> > >>>>>>>> I've changed your example F90 parallel I/O file simple_xy_par_wr.f90 > >>>>>>>> to include a unlimited dimension and independent access of only a > >>>>>>>> subset of processes. Same issue. Even if I explicitly set the access > >>>>>>>> type to independent for the variable. Can you reproduce the issue on > >>>>>>>> your side? > >>>>>>>> > >>>>>>>> The following system configuration on my side: > >>>>>>>> - NetCDF 4.2.1.1 and F90 interface 4.2 > >>>>>>>> - hdf5 1.8.9 > >>>>>>>> - Openmpi 1. > >>>>>>>> - OSX, gcc 4.6.3 > >>>>>>> > >>>>>>> No, I haven't been able to reproduce the issue, but I can't exactly > >>>>>>> duplicate > >>>>>>> your configuration easily, and there have been some updates and bug > >>>>>>> fixes that > >>>>>>> may have made a difference. > >>>>>>> > >>>>>>> First I tried this configuration, which worked fine on your attached > >>>>>>> example: > >>>>>>> > >>>>>>> - NetCDF 4.3.0-rc4 and F90 interface 4.2 > >>>>>>> - hdf5 1.8.11 (release candidate from svn repository) > >>>>>>> - mpich2-1.3.1 > >>>>>>> - Linux Fedora, mpicc, mpif90 wrapping gcc, gfortran 4.5.1 > >>>>>>> > >>>>>>> So if you can build those versions, it should work for you. I'm not > >>>>>>> sure whether > >>>>>>> the fix is in netCDF-4.3.0 or in hdf5-1.8.11, but both have a fix for > >>>>>>> at least one > >>>>>>> parallel I/O hanging process issue: > >>>>>>> > >>>>>>> https://bugtracking.unidata.ucar.edu/browse/NCF-214 (fix in > >>>>>>> netCDF-4.3.0) > >>>>>>> https://bugtracking.unidata.ucar.edu/browse/NCF-240 (fix in > >>>>>>> HDF5-1.8.11) > >>>>>>> > >>>>>>> --Russ > >>>>>>> > >>>>>>> Russ Rew UCAR Unidata Program > >>>>>>> address@hidden http://www.unidata.ucar.edu > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Ticket Details > >>>>>>> =================== > >>>>>>> Ticket ID: TIR-820282 > >>>>>>> Department: Support netCDF > >>>>>>> Priority: High > >>>>>>> Status: Closed > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> Russ Rew UCAR Unidata Program > >>>>> address@hidden http://www.unidata.ucar.edu > >>>>> > >>>>> > >>>>> > >>>>> Ticket Details > >>>>> =================== > >>>>> Ticket ID: TIR-820282 > >>>>> Department: Support netCDF > >>>>> Priority: High > >>>>> Status: Closed > >>>>> > >>>> > >>>> > >>> > >>> Russ Rew UCAR Unidata Program > >>> address@hidden http://www.unidata.ucar.edu > >>> > >>> > >>> > >>> Ticket Details > >>> =================== > >>> Ticket ID: TIR-820282 > >>> Department: Support netCDF > >>> Priority: High > >>> Status: Closed > >>> > >> > >> > > Russ Rew UCAR Unidata Program > > address@hidden http://www.unidata.ucar.edu > > > > > > > > Ticket Details > > =================== > > Ticket ID: TIR-820282 > > Department: Support netCDF > > Priority: Emergency > > Status: Closed > > > > Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: TIR-820282 Department: Support netCDF Priority: Emergency Status: Closed