This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
This email has been forwarded to the netCDF support email archive for archiving. ------- Forwarded Message Return-Path: address@hidden Delivery-Date: Thu Feb 14 16:49:43 2002 Received: from arsc.edu (mcgrew.arsc.edu [199.165.84.136]) by unidata.ucar.edu (UCAR/Unidata) with ESMTP id g1ENngx21288; Thu, 14 Feb 2002 16:49:43 -0700 (MST) Organization: Arctic Region Supercomputing Center Keywords: 200202122006.g1CK6Lx24308 Received: from tanana.arsc.edu (tanana.arsc.edu [199.165.84.149]) by arsc.edu (2000-04-24.ARSC) with ESMTP id OAA18619; Thu, 14 Feb 2002 14:49:41 -0900 (AST) Received: from localhost (jlm@localhost) by tanana.arsc.edu (2000-04-25.ARSC) with ESMTP id OAA13249; Thu, 14 Feb 2002 14:49:41 -0900 (AST) X-Authentication-Warning: tanana.arsc.edu: jlm owned process doing -bs Date: Thu, 14 Feb 2002 14:49:41 -0900 From: John Metzner <address@hidden> To: Steve Emmerson <address@hidden> cc: address@hidden Subject: Re: 20020214: netcdf 3.5.0 ncvarput failure - Cray SV1 In-Reply-To: <address@hidden> Message-ID: <address@hidden> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Steve, The macros.make diff does not show anything significant, just the $SRCDIR and $prefix differences I would have expected. I see a number of *.o file size differences between the "good" and locally built files in src/nctest. The "good" chilkoot$ ls -l *.o - -rw------- 1 jlm cray 6848 Feb 13 19:18 add.o - -rw------- 1 jlm cray 79448 Feb 13 19:18 atttests.o - -rw------- 1 jlm cray 35640 Feb 13 19:18 cdftests.o - -rw------- 1 jlm cray 21544 Feb 13 19:18 dimtests.o - -rw------- 1 jlm cray 3904 Feb 13 19:18 driver.o - -rw------- 1 jlm cray 1768 Feb 13 19:18 emalloc.o - -rw------- 1 jlm cray 1536 Feb 13 19:18 error.o - -rw------- 1 jlm cray 2608 Feb 13 19:18 misctest.o - -rw------- 1 jlm cray 27480 Feb 13 19:18 rec.o - -rw------- 1 jlm cray 13552 Feb 13 19:18 slabs.o - -rw------- 1 jlm cray 7912 Feb 13 19:18 val.o - -rw------- 1 jlm cray 13368 Feb 13 19:18 vardef.o - -rw------- 1 jlm cray 5840 Feb 13 19:18 varget.o - -rw------- 1 jlm cray 6200 Feb 13 19:18 vargetg.o - -rw------- 1 jlm cray 6040 Feb 13 19:18 varput.o - -rw------- 1 jlm cray 6312 Feb 13 19:18 varputg.o - -rw------- 1 jlm cray 30280 Feb 13 19:18 vartests.o - -rw------- 1 jlm cray 5104 Feb 13 19:18 vputget.o - -rw------- 1 jlm cray 7376 Feb 13 19:18 vputgetg.o The locally built "bad": (differences flagged w/ !!!) chilkoot$ ls -l *.o - -rw------- 1 jlm software 7000 Feb 13 20:29 add.o !!! - -rw------- 1 jlm software 79448 Feb 13 20:29 atttests.o - -rw------- 1 jlm software 35640 Feb 13 20:29 cdftests.o - -rw------- 1 jlm software 21544 Feb 13 20:29 dimtests.o - -rw------- 1 jlm software 3904 Feb 13 20:29 driver.o - -rw------- 1 jlm software 1768 Feb 13 20:29 emalloc.o - -rw------- 1 jlm software 1536 Feb 13 20:29 error.o - -rw------- 1 jlm software 2608 Feb 13 20:29 misctest.o - -rw------- 1 jlm software 27480 Feb 13 20:29 rec.o - -rw------- 1 jlm software 13576 Feb 13 20:29 slabs.o !!! - -rw------- 1 jlm software 7560 Feb 13 20:29 val.o !!! - -rw------- 1 jlm software 13368 Feb 13 20:29 vardef.o - -rw------- 1 jlm software 5840 Feb 13 20:29 varget.o - -rw------- 1 jlm software 6200 Feb 13 20:29 vargetg.o - -rw------- 1 jlm software 6040 Feb 13 20:29 varput.o - -rw------- 1 jlm software 6312 Feb 13 20:29 varputg.o - -rw------- 1 jlm software 30280 Feb 13 20:29 vartests.o - -rw------- 1 jlm software 5160 Feb 13 20:29 vputget.o !!! - -rw------- 1 jlm software 7520 Feb 13 20:29 vputgetg.o !!! The src/nctest/nctest binaries are different sizes, of course. The "good": - -rwx------ 1 jlm cray 1966216 Feb 14 12:28 nctest The local "bad": - -rwx------ 1 jlm software 1965960 Feb 13 20:29 nctest Also, I found a core file in the bad directory from the 'make test' run. Thought it might mean something to you. chilkoot$ debugview core CrayTools DebugView 3.0.0.35 (Cray version) Mar 12 2001 14:24:46 - ------------------------------------------------------------------ No symbols are available for debugging because the executable has been stripped or is not accessible. Source-level debugging is not available, and in some cases, TotalView may fail when allocating memory for the assembly-code listing. If you are debugging a core file, running totalview specifying only the core file may help. - ------------------------------------------------------------------- ***** START OF SYMBOLIC DUMP ***** LIST OF PROCESS STATES PIDs 8610: Signal SIGFPE <Floating point exception> DISPLAYING PIDs 8610: Signal SIGFPE <Floating point exception> Signal SIGFPE in routine ncx_putn_float_float at address 0p113671d ncx_putn_float_float was called by putNCv_float at line 1913 (address 0p134147d) putNCv_float was called by nc_put_vara_float at line 5675 (address 0p177240d) nc_put_vara_float was called by nc_put_varm at line 11048 (address 0p251461a) nc_put_varm was called by ncvarputg at line 624 (address 0p275263c) ncvarputg was called by test_varputgetg at line 119 (address 0p12067b) test_varputgetg was called by $STKOFEN at line 52 (address 0p545453b) $STKOFEN was called by test_ncvarputg at line 52 (address 0p3273b) test_ncvarputg was called by main at line 66 (address 0p12644d) main was called by $START$ at line 350 (address 0p1121c) ***** END OF SYMBOLIC DUMP ***** Any thoughts on where to go next to get a good 'make test' run? I'm thinking of building a 'chroot' environment where I can guarantee I've eliminated any /usr/local/lib libraries without affecting the real users on the system. I can make any changes I want within it to isolate the cause of the failed 'make test' Thanks for all your time and quick responses. It is much appreciated. Regards, John Metzner - Cray, Inc address@hidden Arctic Region Supercomputing Center address@hidden 910 Yukon Drive Rm. 106E Phone: (907)474-5431 Fairbanks, AK 99775-6020 FAX: (907)474-1820 On Thu, 14 Feb 2002, Steve Emmerson wrote: > Date: Thu, 14 Feb 2002 15:07:03 -0700 > From: Steve Emmerson <address@hidden> > To: John Metzner <address@hidden> > Cc: address@hidden > Subject: 20020214: netcdf 3.5.0 ncvarput failure - Cray SV1 > > John, > > >Date: Thu, 14 Feb 2002 12:48:03 -0900 > >From: John Metzner <address@hidden> > >Organization: Arctic Region Supercomputing Center > >To: Steve Emmerson <address@hidden> > >Subject: Re: 20020212: netcdf 3.5.0 ncvarput failure - Cray SV1 > >Keywords: 200202122006.g1CK6Lx24308 > > The above message contained the following: > > > I'm still working on trying to get netCDF 3.5.0 built and tested on > > our Cray SV1ex. I tried turning down the optimization level as you > > suggested > > to no avail, same error during 'make test'. This was done after a 'make > > distclean', making sure there was no config.cache and resetting the > > environment variables. There is one (that I know of) local change to the > > default library search path which causes /usr/local/lib to be prepended > > to the library search path (even prempting -L on the command line) which I > > pulled out. I ran through the full build & test sequence again and got the > > same error as below. > > I did pull the netCDF-3.5.0 package inside Cray Corporate, built and > > tested the package there on a SV1ex. It worked, so the problem is some > > local > > system change which is getting in the way. > > I pulled the package from Cray Corporate back out to the site with > > the "good" libraries and build products. I reran the 'make test' on it, > > again > > without error. > > Next I copied the locally built libsrc/libnetcdf.a and > > cxx/linetcdf_c++.a into the proper location for the "good" package from Cray > > Corporate. A 'make test' ran again without error. I was trying to > > determine > > if the problem was in the test code or the libraries built locally. Is that > > a valid test? > > If your locally-built libnetcdf.a library, when copied into the Cray > Corporate package, results in that package correctly executing a "make > test", then it would seem that the problem lies in the building and/or > execution of the netCDF-2 test program rather than with the netCDF > library functions. > > A good way to look at the differences in the build environments is to > use the "diff" utility on the file "macros.make", which is located in > the top-level source directory. Does it show anything significant? > > Another thing to check is whether or not the files in the netCDF-2 test > directory, "nctest", are the same. > > Regards, > Steve Emmerson <http://www.unidata.ucar.edu> > >From address@hidden Fri Feb 15 12:09:40 2002 >Subject: Re: 20020214: netcdf 3.5.0 ncvarput failure - Cray SV1 Steve, I did a bit more testing with the nctest code, comparing builds between here at ARSC and inside Cray Corporate. I was able to get the nctest code to build and run successfully here when I changed the CFLAGS entry in the macros.make file from "-O3" to "-h inline3,scalar3,task1,vector0". Also "-O0" would work, but not "-O1" (-O1 is equivalent to -h inline1,scalar1, task1,vector1). I found that the versions of C/C++ compilers was slightly different between here and the Cray Corporate machine. We are running version 3.5.0.1 and the corporate system was 3.5.0.3. When I changed to the same 3.5.0.1 compiler on the corporate machine, I got the same failure. The problem was still there when I switched to 3.5.0.2 on the corporate system. So, Cray made some change to the compiler at 3.5.0.3 which allows nctest to not error out on a floating point exception. You might want to enter this into you problem/fix database in case some other poor Cray soul gets bit by it. Thanks for all your help and quick responses. It's great to get this kind of support on an open source package, pretty rare too. Thanks again, John Metzner - Cray, Inc address@hidden Arctic Region Supercomputing Center address@hidden 910 Yukon Drive Rm. 106E Phone: (907)474-5431 Fairbanks, AK 99775-6020 FAX: (907)474-1820 On Thu, 14 Feb 2002, Steve Emmerson wrote: > Date: Thu, 14 Feb 2002 15:07:03 -0700 > From: Steve Emmerson <address@hidden> > To: John Metzner <address@hidden> > Cc: address@hidden > Subject: 20020214: netcdf 3.5.0 ncvarput failure - Cray SV1 > > John, > > >Date: Thu, 14 Feb 2002 12:48:03 -0900 > >From: John Metzner <address@hidden> > >Organization: Arctic Region Supercomputing Center > >To: Steve Emmerson <address@hidden> > >Subject: Re: 20020212: netcdf 3.5.0 ncvarput failure - Cray SV1 > >Keywords: 200202122006.g1CK6Lx24308 > > The above message contained the following: > > > I'm still working on trying to get netCDF 3.5.0 built and tested on > > our Cray SV1ex. I tried turning down the optimization level as you > > suggested > > to no avail, same error during 'make test'. This was done after a 'make > > distclean', making sure there was no config.cache and resetting the > > environment variables. There is one (that I know of) local change to the > > default library search path which causes /usr/local/lib to be prepended > > to the library search path (even prempting -L on the command line) which I > > pulled out. I ran through the full build & test sequence again and got the > > same error as below. > > I did pull the netCDF-3.5.0 package inside Cray Corporate, built and > > tested the package there on a SV1ex. It worked, so the problem is some > > local > > system change which is getting in the way. > > I pulled the package from Cray Corporate back out to the site with > > the "good" libraries and build products. I reran the 'make test' on it, > > again > > without error. > > Next I copied the locally built libsrc/libnetcdf.a and > > cxx/linetcdf_c++.a into the proper location for the "good" package from Cray > > Corporate. A 'make test' ran again without error. I was trying to > > determine > > if the problem was in the test code or the libraries built locally. Is that > > a valid test? > > If your locally-built libnetcdf.a library, when copied into the Cray > Corporate package, results in that package correctly executing a "make > test", then it would seem that the problem lies in the building and/or > execution of the netCDF-2 test program rather than with the netCDF > library functions. > > A good way to look at the differences in the build environments is to > use the "diff" utility on the file "macros.make", which is located in > the top-level source directory. Does it show anything significant? > > Another thing to check is whether or not the files in the netCDF-2 test > directory, "nctest", are the same. > > Regards, > Steve Emmerson <http://www.unidata.ucar.edu> >