This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
> I did take a look at seeing if "global I/O" would work with 3.3.1 > or 3.4 on the NERSC T3E today. > > I used the "ncdftest.F" parallel test that I extracted from an LLNL > physics code a couple years ago, and I think I may have previously forwarded > to Unidata in one of my updates. On multiple PEs, it either deadlocks in: > > Beginning of Traceback (PE 0): > Interrupt at address 0x8001282fc in routine '_shmem_swap'. > Called from line 772 (address 0x8000c13b4) in routine '_glio_set_lock'. > Called from line 1261 (address 0x8000beee8) in routine '_par_get_pe_dirty_page'. > Called from line 870 (address 0x8000bd264) in routine '_glob_flush'. > Called from line 43 (address 0x8000b4d88) in routine 'ffflush'. > Called from line 298 (address 0x80000d194) in routine 'ncio_ffio_sync'. > Called from line 363 (address 0x80000a8f0) in routine 'write_NC'. > Called from line 790 (address 0x80000b310) in routine 'NC_endef'. > Called from line 959 (address 0x80000bb24) in routine 'nc_enddef'. > Called from line 214 (address 0x800039d54) in routine 'ncendef'. > Called from line 388 (address 0x8000462d8) in routine 'c_ncendf'. > Called from line 394 (address 0x80004639c) in routine 'NCENDF'. > Called from line 88 (address 0x80000183c) in routine 'NCDFTEST'. > Called from line 475 (address 0x800000c98) in routine '$START$'. > End of Traceback. > > (with setenv NETCDF_FFIOSPEC global.privpos) > > or core dumps like: > > SIGNAL: Operand range error ( [0] memory management fault) > > Beginning of Traceback (PE 1): > Interrupt at address 0x80008c270 in routine 'memcpy'. > Called from line 2002 (address 0x80001ba84) in routine 'ncx_putn_schar_schar'. > Called from line 1201 (address 0x80005dd84) in routine 'ncx_put_NC'. > Called from line 877 (address 0x800011ac0) in routine 'nc__create'. > Called from line 900 (address 0x800011c88) in routine 'nc_create'. > Called from line 153 (address 0x80005e8fc) in routine 'nccreate'. > Called from line 265 (address 0x800076844) in routine 'c_nccre'. > Called from line 281 (address 0x800076bc0) in routine 'NCCRE'. > Called from line 66 (address 0x800001588) in routine 'NCDFTEST'. > Called from line 475 (address 0x800000c98) in routine '$START$'. > End of Traceback. > Operand range error(coredump) > > with other NETCDF_FFIOSPEC "global" settings. > > The above sorts of glitches can take quite a bit of time to sort > out, particularly if there is strange race condition that netCDF > has helped expose (possibly the first example) or something > is getting overwritten (possibly the second). > > Note, both "3.3.1" and "3.4" show the same symptoms with "Global I/O". I can't tell from the information presented whether this NCENDF is the 'initial' end definition after a create, or an end definition after redefinition. The second case is more complex. This points out an issue which I failed to point out in my previous discussion. While in 'define mode' (between nc_create() or nc_redef() and nc_enddef()) The entire in memory netcdf structure (struct NC) becomes read-write. At other times (most of the time) it is read-only except for the 'numrecs' field which I discussed before. So, there should be exclusive (read-write) locks on the structure for the whole definition sequence (nc_create() or nc_redef() thru nc_enddef()). Finer grained locking would probably be possible, but more trouble than it is worth. I don't know exactly why this would have worked in netcdf-2 but fails now. I would say that it was probably 'luck' that it worked in netcdf-2, since the general situation described in the previous paragraph was true there as well. The I/O which occurs in netcdf redefinition (nc_enddef() after nc_redef()) is quite different in netcdf-3 than in netcdf-2. *Note* Redefinition should be avoided wherever possible. It almost always forces a copy of the entire file. In netcdf-2, a redef call opened a new file to copy into, and 'unlinked' the old. In netcdf-3, the copy is in place, like a file based 'memmove()'. Hope this helps. -glenn