This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
> Howdy! > > Thanks for looking into this. > > I have tried to use parallel/IO with limited success. The main > limitation is that it doesn't allow for compression (HDF5 issue, I > know). The computing cluster where I make my big runs has a parallel > file system (panasas or something like that), but my job just hangs > when I try to use netcdf4 parallel IO. The system is a seemingly > standard linux cluster with x86 processors, infiniband, and an LSF > batch (details at > <http://www.oscer.ou.edu/hardsoft_dell_cluster_harpertown_sooner.php > > ). Parallel/IO works fine on my desktop machine (Mac OS X), > however, so I think my code is OK. On the mac I usually use 8 or > fewer threads, in which case a round-robin (token ring) write works > better because I get the compression. As you point out, the compression issue is not something we can do anything about. It is because there is no way for a process to predict where to write its data, since the data of other processes, who write earlier in the file, will be of some unknown length. So the process cannot tell where it should write the data. Getting code to run on supercomputers is always a challenge. Parallel I/O is well-tested and does work. Was the netCDF on the supercomputer built for parallel? (HDF5 must be built with --enable-parallel, and mpicc must be used to compile). There are parallel tests in the netCDF distribution (like nc_test4/tst_parallel3.c) which can be run on your target platform. If they hang, there is something wrong with the platform and you can demonstrate it to the sysadmins. If they work, you can take a look to see what they are doing that your code is not doing. > > I tried a while back to set up an example with fake data to try to > reproduce the memory growth problem I was seeing, but without real > success. I thought then that maybe there was a bug in my program, but > switching the reading from netcdf4 to pure hdf5 seemed to solve the > problem. So I think it really is something with netcdf4's routines. > > I'm getting an ftp directory set up so you can get an example file. > I'll let you know when that happens. > Just do an ncdump -h on one of your data files and send it to me, and I will take it from there. I am getting ready to release 4.1 very soon, so if we want to get any fixes it, it would be best if you could send me the ncdump right away... Thanks, Ed Ticket Details =================== Ticket ID: PEB-847323 Department: Support netCDF Priority: Critical Status: Closed