This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>To: address@hidden, >From: Gottfried Necker <gottfried.necker@xxxxxxxxxxx> >Subject: NetCDF performance problems. >Organization: . >Keywords: 200306180911.h5I9BvLd025090 netCDF 3.5.1-beta10 Fujitsu VPP Gottfried, I tried profiling netcdf-beta5 and netcdf-beta10 this morning on the nc_test test program provided with the distribution, on a Solaris 8 platform, and you were right, a platform-independent performance problem has been introduced that we need to fix before release. Here's a comparison of the number of times px_pgin and px_pgout are called using the netcdf-3.5.1 beta5 versus beta10 releases, using gprof to profile and just capturing the number of calls with grep: test/gf/beta5-xpg/src/nc_test$ gprof nc_test | grep px_pg 0.00 0.07 2788/2788 px_pgin [14] 0.00 0.00 39/2129 px_pgout [9] ... test/gf/beta10-xpg/src/nc_test$ gprof nc_test | grep px_pg 0.02 1.43 112164/112164 px_pgin [5] 0.00 0.00 43/30624 px_pgout [9] ... Since you've given us enough information to reproduce the problem here on a Solaris platform, we should be able to fix the problem here and put it into the next release. I let you know when we have figured out what the problem is and have a patch to test. Thanks! --Russ On Fri, 20 Jun 2003 11:57:56 +020, you wrote: > Hi Gottfried, > > > > > > > Another possibility would be providing you with some versions between > > > beta3 and beta10 that would help isolate which changes caused the > > > problem. > > I tried with netcdf-3.5.1-beta5 and there's no problem. I went back to > > beta10 and got the problem again. I diffed the libsrc directory and > > the only substantial difference between these versions is in > > posixio.c, where the call to ftruncate is replaced by calls to seek. I > > will try to put the code with ftruncate into beta10 to see what > > happens. But I don't have the time to do it now. I will try this on > > friday. > > Thanks, just this information is a big help. I'm also anxious to hear > what you find out when substituting ftruncate for the call to lseek. > The revision notice we have on that change was: > > ... eliminated unnecessary use of ftruncate(), because it fails on > FAT32 file systems under Linux. > > If this causes a performance problem on other systems, maybe we can > find a better fix for the Linux problem. > > --Russ I first thought, there is a single problem, but now I think, there are two. The beta10 uses too much system time on the NFS and waits for I/O on the local file systems. If I put the posixio.c (rev. 1.69) into the beta10 source and recompile the library, the waiting for I/O problem is gone. But now I can see the system time problem also on the local file system. I did a PC sampling on my program with the beta10 and compared it with beta5 and found out, that some routines (px_pgin and px_pgout) are called many times more with beta10 than with beta5. If these routine are really called so often, this would explain the huge difference in system time usage. But I have no idea, why this happens. To illustrate the problem, here's the output of timex for beta10 (with posixio 1.69): real 11:19.21 user 7:41.07 sys 2:17.28 vu-user 6:03.32 vu-sys 0.00 For comparison the output for beta5: real 10:39.11 user 9:46.32 sys 4.26 vu-user 7:56.27 vu-sys 0.00 Actually the problem is slightly worse than shown here, because the beta10 calculation is stopped earlier. I don't know, what could cause such a problem, but I suspect it is also present on other platforms. But maybe on these platforms you don't pay such a high price for calling pg_* too often. Gottfried