This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Thomas, > I have an MPI-parallel application with a decomposition such that each array > is > completely handled by one process for I/O purposes and the arrays are > distributed in a round-robin fashion, i.e. task 0 holds all of array A, task 1 > holds all of array B and so forth. > > My expectation was that I could write this with netcdf4 parallel I/O, so I > compiled netcdf 4.2.1.1 for OpenMPI 1.4.2 and hdf5 1.8.9 on Debian GNU/Linux > x86_64 and started testing. > > Unfortunately, when I issue nc_create_par with NC_MPIPOSIX and > nc_var_par_access > with flag NC_INDEPENDENT I only get invalid output, when I change the > nc_create_par option to NC_MPIIO the program hangs on nc_close. > > I've reduced my use-case to a small test mostly resembling one of the > demonstration programs. I think the most relevant part is that the processes > not > having any elements from the array each use start and count values of 0 for > every dimension. > > Please see the attached files for more information. > > When running the attached program with > > $ mpirun -n 5 ./nc4partest > mpi_name: taifun size: 5 rank: 0, isDataWriter=0 > mpi_name: taifun size: 5 rank: 1, isDataWriter=0 > mpi_name: taifun size: 5 rank: 2, isDataWriter=1 > mpi_name: taifun size: 5 rank: 4, isDataWriter=0 > mpi_name: taifun size: 5 rank: 3, isDataWriter=0 > mpi_rank=1 start[0]=0 start[1]=0 count[0]=0 count[1]=0 > mpi_rank=2 start[0]=0 start[1]=0 count[0]=24 count[1]=24 > mpi_rank=0 start[0]=0 start[1]=0 count[0]=0 count[1]=0 > mpi_rank=3 start[0]=0 start[1]=0 count[0]=0 count[1]=0 > mpi_rank=4 start[0]=0 start[1]=0 count[0]=0 count[1]=0 > > and from this point on the program hangs. > > I've tried to locate a hint how to use the nc_put_vara_int call for this case > but found nothing. > > Do I have to redistribute the data before writing? Are there other values for > start/count I could use? I just succeeded in running a test case that used count[0] = 0 on an MPI parallel file system using the netCDF-4 parallel I/O inherited from HDF5, and it ran fine. The test I ran just inserted the following code in a loop after line 136 in nc_test4/tst_parallel.c: /* See if count dimension == 0 returns error */ count_save = count[0]; count[0] = 0; if (nc_put_vara_int(ncid, v1id, start, count, slab_data)) ERR; count[0] = count_save ; Discussing this with CISL consultants indicates the problem may be platform-specific. --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: PDL-125161 Department: Support netCDF Priority: Normal Status: Closed