[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 20030730: Netcdf performance problem on NEC SX6
- Subject: Re: 20030730: Netcdf performance problem on NEC SX6
- Date: Wed, 30 Jul 2003 16:15:10 -0600
>To: address@hidden
>From: Mathis Rosenhauer <address@hidden>
>Subject: Netcdf performance problem on NEC SX6
>Organization: Deutsches Klimarechenzentrum
>Keywords: 200307301224.h6UCO8Ld019417 netCDF NEC SX6 performance
Hi Mathis,
I hope you don't mind that I'm also CC:ing Gottfried Necker on this
reply, since he has recently run into a similar problem.
> I got reports from our users who see a performance drop between the
> old netcdf-3.4.0 version and the latest 3.5.1-beta11 on our NEC SX6
> machines or with any other 3.5.x release for that matter. I have
> narrowed this down to ncio_px_get() in posixio.c:
>
> if (*vpp == NULL)
> {
> ncio_px_sync(nciop);
> pxp->bf_offset = OFF_NONE;
> pxp->bf_cnt = 0;
> }
>
> That statement seems to be new in 3.5.x. Tracing back a little bit I
> found in putget.m4 the function putNCvx_$1_$2() which uses a local
> pointer
>
> void *xp;
>
> and calls
>
> int lstatus = ncp->nciop->get(ncp->nciop, offset,extent, RGN_WRITE, &xp);
>
> xp is undefined in the first round and happens to be NULL in a lot of
> cases on our system which causes the slowdowns in ncio_px_get().
>
> This is probably related to another report I found in your mail
> archives but using 3.5.1-beta11 doesn't help much in our case.
>
> http://www.unidata.ucar.edu/cgi-bin/msgout?/glimpse/netcdf/5141
>
> Would it be safe to disable the "if (*vpp == NULL)" statement in
> posixio.c or make xp static?
Thanks for digging into this problem and reporting what you found.
It looks like the ncio_px_get() change was made to fix a
synchronization bug with the symptoms that nc_sync() by a reader was
not making visible the changes made by a concurrent writer. It
appears that the "fix" may have done more than what was intended.
I definitely would not advise making xp static, as it looks to me as
if xp is supposed to be an output-only variable for the
ncp->nciop->get() call, so it should never be dereferenced to test its
value against NULL.
If you aren't using concurrent writers and readers, you may be able to
safely discard the "if (*vpp == NULL)" statement, but we aren't sure
of the right fix in the case of concurrent access yet.
> Thanks in advance for your help
Thanks again for the help in debugging this problem
> Mathis
>
> --
> Mathis Rosenhauer
> Wissenschaftliches Rechnen
> Deutsches Klimarechenzentrum http://www.dkrz.de
--Russ
_____________________________________________________________________
Russ Rew UCAR Unidata Program
address@hidden http://my.unidata.ucar.edu