This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Dear Professor Constantinescu, We might not be able to help you very much because the problem appears to be due to the behavior of NFS on your Linux cluster rather than with the netCDF library itself. As you observed -- Error only occurs while writing files to a directory of an NFS filesystem (desired). -- Error does not occur (works fine!) when writing to local /tmp. (each process writes to its local /tmp). (not desired, since result files are scattered across the cluster). The primary person responsible for the netCDF package is attending a conference at this time. He did, however, have the following to say: ... the problem described will be difficult to debug because it appears to be dependent on an NFS problem with a Linux cluster that we probably can't reproduce here. If he could supply a small complete example that failed, we could try to duplicate the problem, but if it depends on the details of the NFS implementation and running on a cluster, that may be difficult. Version 3.6.0-beta6 of netCDF is also now available, although I don't recognize any bugs we fixed from 3.5.1 that would be relevant to this problem. Professor Constantinescu may not know about the parallel netCDF package available from http://www-unix.mcs.anl.gov/parallel-netcdf/ that may be a better solution to his problem. It would require changes to his code, since the netCDF interface is a little different, but it is based on MPI and has been successfully used in several similar modeling projects. The pnetcdf developers may also be more familiar with the symptoms he describes, since they have debugged many problems with parallel netCDF I/O, MPI, and clusters. There is a mailing list address@hidden for discussion of their parallel netCDF software that might be able to help. --Russ Can you reduce the scope of the problem to a small example? Is the parallel netCDF package a possible solution for you? Regards, Steve Emmerson --------Begin Original Message From: Serban G Constantinescu <address@hidden> To: address@hidden Subject: e-mail about netcdf problems on a 32 bit PC cluster I am contacting you about a SUPPORT REQUEST FORM that I filled yesterday about The problems which we have when we try to write large amounts of data in netcdf Using a massively parallel fortran90 code. Email was submitted from following website: http://my.unidata.ucar.edu/content/support/email_support.php Could you please confirm you received it? Do you know about how much time it takes to get an answer? Thank you for your help. Best regards George Constantinescu Assistant Professor Dept. Civil and Environmental Engineering The University of Iowa Package: netCDF Fortran (77 + 90) Package version: 3.5.1 Operating System: Redhat Linux 2.4.9-e.49smp #1 SMP Hardware Information: 64-node, 128-CPU, Linux-based computing cluster running MPICH -1.2.5..12 from Myrinet, Sun Grid Engine 5.3, and Sun Control Station 2.0. Compute nodes (64) are x86-based Sun Fire V60x servers (see: http://www.sun.com/servers/entry/v60x/). Head nodes (2) are x86-based Sun Fire V65x servers (see: http://www.sun.com/servers/entry/v65x/). Compute nodes have two 36 GB disk drives. Apple Stoarge Array for shared storage. SMC network for transmitting data from the nodes to the Apple storage array (three SMC 3 SMC TigerSwitch 10/100/1000 8624T 24-port switches). Myrinet switch for internode communications. Subject: nf_enddef() Input/output error Description: Hello, We have a CFD Fortran MPI/netCDF parallel code which exhibits "Input/output error" (Error 5) upon calling nf_enddef(). The code runs with 24 MPI processes. At the end of computation, the resulting data is written to disk via netCDF. Each MPI process writes to its own file; there is no simultaneous access to any single file. Each file's size is approximately 31 to 32 Megabytes when no error occurs. When the error occurs, typically only the file's header is written, which is 409,600 bytes; occasionally a few megabytes of data are written. We don't have a parallel file system, only NFS. MPI is MPICH -1.2.5..12/Myrinet. Observations: -- Error only occurs while writing files to a directory of an NFS filesystem (desired). -- Error does not occur (works fine!) when writing to local /tmp. (each process writes to its local /tmp). (not desired, since result files are scattered across the cluster). -- We have 2 NFS filesystems we've tried: On one, about 23 out of 24 processes report the error (one error per process); on the other, about 15 out of 24 processes report the error. Could you advise us as to the cause of the error and how we might fix it? The compiler and library versions are: bash-2.05$ ifc -V Intel(R) Fortran Compiler for 32-bit applications, Version 7.1 Build 20031225Z Copyright (C) 1985-2003 Intel Corporation. All rights reserved. FOR NON-COMMERCIAL USE ONLY GNU ld version 2.11.90.0.8 (with BFD 2.11.90.0.8) Supported emulations: elf_i386 i386linux elf_i386_glibc21 netcdf is version 3.5.1 mpich is version 1.2.5..12 --------End Original Message