[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDF #NXL-777870]: NetCDF Fortran - "Abort trap signal" error on nf03_test
- Subject: [netCDF #NXL-777870]: NetCDF Fortran - "Abort trap signal" error on nf03_test
- Date: Mon, 30 Apr 2018 16:35:44 -0600
Hello,
This appears to be related to a known issue rep: openmpi and tests attempting
to spawn more processes than there are processors available (as you correctly
identify). I believe it is safe to ignore this error because it does not
actually reflect an error with netCDF, but rather with how the test is being
invoked.
Looking at the CESM/netCDF-Fortran failure, I will move this discussion to the
github issue you opened. I am setting up an environment to try to recreate
this issue and will report there shortly.
Thank you!
-Ward
> Dear Unidata’s netCDF support,
>
> I am attempting to build the netCDF libraries on my department cluster
> using OpenMPI compiled with the Intel compilers. The libraries are intended
> to be used with the CESM model I am trying to run.
>
> However, I am getting an error in one NetCDF Fortran test, which does not
> prevent building the libraries but which seems to be important anyway: it
> is the same error message that are on the CESM logs after an unsuccessful
> simulation - the model runs but does not produce any output.
>
> My software versions are:
> Intel C and Fortran compilers 17
> OpenMPI 3.0.0
> HDF5 1.10.2
> netCDF C 4.6.1
> netCDF Fortran 4.4.4
>
> First of all, this is how I load the Intel compiler and Open MPI modules:
>
> module load intel/17.0.1
> module load openmpi/3.0.0/intel/17.0.1
>
> And here is some (hopefully) useful information on the modules:
>
> module show intel/17.0.1
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> /sw/arcts/centos7/modulefiles/intel/17.0.1.lua:
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> help([[
> The Intel module enables the Intel family of compilers (C/C++
> and Fortran) and updates the $PATH, $LD_LIBRARY_PATH,
> $INCLUDE, and $MANPATH environment variables to access the
> compiler binaries, libraries, include files, and available man
> pages, respectively.
>
> The following additional environment variables are also defined:
>
> $ICC_BIN (path to icc/icpc compilers )
> $ICC_LIB (path to C/C++ libraries )
> $IFC_BIN (path to ifort compiler )
> $IFC_LIB (path to Fortran libraries )
>
> See the man pages for icc, icpc, and ifort for detailed information
> on available compiler options and command-line syntax.
>
> ]])
> whatis("Name: Intel")
> whatis("Description: Intel compiler suite.")
> whatis("License information: None provided")
> whatis("Category: Library, Development, Core")
> whatis("Package documentation: None provided")
> whatis("Version: 17.0.1")
> setenv("ICC_BIN","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/bin/intel64")
> setenv("IFC_BIN","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/bin/intel64")
> setenv("ICC_LIB","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/compiler/lib/intel64")
> setenv("IFC_LIB","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/compiler/lib/intel64")
> prepend_path("INTEL_LICENSE_FILE","/sw/arcts/centos7/intel/licenses/network.lic")
> prepend_path("PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/bin/intel64")
> prepend_path("LD_LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/tbb/lib/intel64_lin/gcc4.4")
> prepend_path("LD_LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/daal/lib/intel64_lin")
> prepend_path("LD_LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/debugger_2017/libipt/intel64/lib")
> prepend_path("LD_LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/debugger_2017/iga/lib")
> prepend_path("LD_LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/tbb/lib/intel64/gcc4.7")
> prepend_path("LD_LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64_lin")
> prepend_path("LD_LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/ipp/lib/intel64")
> prepend_path("LD_LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/compiler/lib/intel64_lin")
> prepend_path("LD_LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/compiler/lib/intel64")
> prepend_path("LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/tbb/lib/intel64_lin/gcc4.4")
> prepend_path("LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/daal/lib/intel64_lin")
> prepend_path("LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/tbb/lib/intel64/gcc4.7")
> prepend_path("LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64_lin")
> prepend_path("LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/compiler/lib/intel64_lin")
> prepend_path("LIBRARY_PATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/ipp/lib/intel64")
> prepend_path("MANPATH","/sw/arcts/centos7/intel/17.0.1-1/man/common")
> prepend_path("NLSPATH","/sw/arcts/centos7/intel/17.0.1-1/debugger_2017/gdb/intel64/share/locale/%l_%t/%N")
> prepend_path("NLSPATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64_lin/locale/%l_%t/%N")
> prepend_path("NLSPATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/compiler/lib/intel64/locale/%l_%t/%N")
> prepend_path("MKLROOT","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/mkl")
> prepend_path("CPATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/daal/include")
> prepend_path("CPATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/tbb/include")
> prepend_path("CPATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/mkl/include")
> prepend_path("CPATH","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/ipp/include")
> setenv("IPPROOT","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/ipp")
> setenv("TBBROOT","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/tbb")
> setenv("DAALROOT","/sw/arcts/centos7/intel/17.0.1-1/compilers_and_libraries_2017.1.132/linux/daal")
>
> module show openmpi/3.0.0/intel/17.0.1
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> /sw/arcts/centos7/modulefiles/openmpi/3.0.0/intel/17.0.1.lua:
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> help([[
> OpenMPI consists of a set of compiler 'wrappers' that include the appropriate
> settings for compiling MPI programs on the cluster. The most commonly used
> of these are
>
> mpicc
> mpic++
> mpif90
>
> Those are used in the same way as the regular compiler program, for example,
>
> $ mpicc -o hello hello.c
>
> will produce an executable program file, hello, from C source code in hello.c.
>
> In addition to adding the OpenMPI executables to your path, the following
> environment variables set by the openmpi module.
>
> $MPI_HOME
>
> ]])
> whatis("Name: openmpi")
> whatis("Description: OpenMPI implementation of the MPI protocol")
> whatis("License information: https://www.open-mpi.org/community/license.php")
> whatis("Category: Utility, Development, Core")
> whatis("Package documentation: https://www.open-mpi.org/doc/")
> whatis("ARC examples: /scratch/data/examples/openmpi")
> whatis("Version: 3.0.0")
> prereq("intel/17.0.1")
> prepend_path("PATH","/sw/arcts/centos7/openmpi/3.0.0-intel-17.0.1-1/bin")
> prepend_path("MANPATH","/sw/arcts/centos7/openmpi/3.0.0-intel-17.0.1-1/share/man")
> prepend_path("LD_LIBRARY_PATH","/sw/arcts/centos7/openmpi/3.0.0-intel-17.0.1-1/lib")
> setenv("MPI_HOME","/sw/arcts/centos7/openmpi/3.0.0-intel-17.0.1-1")
> setenv("OMPI_MCA_btl_openib_warn_no_device_params_found","0")
>
> Then, this the command I used to build HDF5:
>
> FC=mpif90 CC=mpicc CXX=mpicxx ./configure --with-zlib=${ZDIR}
> --with-szlib=${SZDIR} --prefix=${H5DIR} --enable-parallel
> make
> make install
> make check-install
>
> and this is how I am building the NetCDF C and Fortran libraries
> respectively:
>
> CPPFLAGS=-I${H5DIR}/include LDFLAGS=-L${H5DIR}/lib CC=mpicc
> ./configure --prefix=${NCDIR} --enable-shared --disable-dap
> --enable-parallel-tests
> make
> make install
> make check
>
> and
>
> CPPFLAGS=-I${NCDIR}/include LDFLAGS=-L${NCDIR}/lib CC=mpicc F77=mpif77
> FC=mpif90 ./configure --prefix=${NCDIR}
> make
> make install
> make check
>
> where all the paths ${ZDIR}, ${SZDIR}, ${H5DIR} and ${NCDIR} are exported
> to my LD_LIBRARY_PATH environmental variable.
>
> All the building processes do finish successfully, but I do get errors
> during make check:
>
> a) on NetCDF C, saying:
>
> ===========================================
> netCDF 4.6.1: nc_test4/test-suite.log
> ===========================================
>
> # TOTAL: 68
> # PASS: 67
> # SKIP: 0
> # XFAIL: 0
> # FAIL: 1
> # XPASS: 0
> # ERROR: 0
>
> .. contents:: :depth: 2
>
> FAIL: run_par_test
> ==================
>
> Testing MPI parallel I/O with various other mode flags...
>
> *** Testing illegal mode combinations
> *** Testing create + MPIO + fletcher32
> *** Testing create + MPIO + deflation
> ok.
> *** Tests successful!
>
> Testing MPI parallel I/O without netCDF...
>
> *** Testing basic MPI file I/O.
> *** testing file create with parallel I/O with MPI...ok.
> *** Tests successful!
>
> Testing very simple parallel I/O with 4 processors...
>
> *** tst_parallel testing very basic parallel access.
> *** tst_parallel testing whether we can create file for parallel
> access and write to it...ok.
> *** Tests successful!
>
> Testing simple parallel I/O with 16 processors...
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 16 slots
> that were requested by the application:
> ./tst_parallel3
>
> Either request fewer slots for your application, or make more slots available
> for use.
> --------------------------------------------------------------------------
> FAIL run_par_test.sh (exit status: 1)
>
> which is expected since the login nodes on my cluster have only 12 cores.
> As you can see, the test with only four cores finished successfully.
>
> b) on NetCDF Fortran, saying:
>
> *** testing nf_copy_att ...
> bad var id: NetCDF: Attribute not found
> nf_copy_att: NetCDF: Attribute not found
> bad var id: NetCDF: Attribute not found
> nf_copy_att: NetCDF: Attribute not found
> forrtl: error (76): Abort trap signal
> Image PC Routine Line Source
> nf03_test 00000000004F54A1 tbk_trace_stack_i Unknown Unknown
> nf03_test 00000000004F35DB tbk_string_stack_ Unknown Unknown
> nf03_test 00000000004AB2F4 Unknown Unknown Unknown
> nf03_test 00000000004AB106 tbk_stack_trace Unknown Unknown
> nf03_test 00000000004742F9 for__issue_diagno Unknown Unknown
> nf03_test 0000000000477B04 for__signal_handl Unknown Unknown
> libpthread-2.17.s 00002AF3675F45E0 Unknown Unknown
> Unknownlibc-2.17.so 00002AF3678361F7 gsignal
> Unknown Unknownlibc-2.17.so 00002AF3678378E8 abort
> Unknown Unknown
> nf03_test 000000000040B362 Unknown Unknown Unknown
> nf03_test 0000000000463BA6 Unknown Unknown Unknown
> nf03_test 0000000000466E75 Unknown Unknown Unknown
> nf03_test 000000000045F4BB Unknown Unknown Unknown
> nf03_test 0000000000455BCE Unknown Unknown Unknown
> nf03_test 0000000000456C55 Unknown Unknown Unknown
> nf03_test 000000000040B31E Unknown Unknown
> Unknownlibc-2.17.so 00002AF367822C05 __libc_start_main
> Unknown Unknown
> nf03_test 000000000040B229 Unknown Unknown Unknown
>
> Error b) above is the one that really concerns me. As I mentioned before,
> on CESM’s logs I see several *Attribute not found* errors, and no output
> from the model is written to disk (although the job is not killed). I
> suspect there is some kind of connection between the errors.
>
> Please see both the config.log and test-suite.log logs for the NetCDF
> Fortran error attached to this message.
>
> Just for the records, I have also filed an issue on Github at
> https://github.com/Unidata/netcdf-fortran/issues/81.
>
> Do you guys have any ideas on what can be causing this error? I really
> appreciate any help you can provide on how to fix it!
>
> Regards,
> ?
> --
> Thiago V. dos Santos
> Postdoctoral research fellow
> Department of Climate and Space Sciences and Engineering
> University of Michigan
>
>
Ticket Details
===================
Ticket ID: NXL-777870
Department: Support netCDF
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata
inquiry tracking system and then made publicly available through the web. If
you do not want to have your interactions made available in this way, you must
let us know in each email you send to us.