[petsc-dev] What is this? "Optimize VecNorm_MPI. Use BLASdot_ instead of BLASnrm2_"
Barry Smith
bsmith at mcs.anl.gov
Tue Jan 3 17:48:43 CST 2012
On Jan 3, 2012, at 4:47 PM, Jed Brown wrote:
> On Tue, Jan 3, 2012 at 16:44, Jack Poulson <jack.poulson at gmail.com> wrote:
> It is possible, though unlikely that the BLAS dot could be faster than the BLAS nrm2, though I am skeptical. The reason is that the result of dnrm2 on a vector u is more stable than the square root of the inner product of u with itself via ddot, as it scales the temporary products of the norm to make the computation more accurate:
> http://www.netlib.org/blas/dnrm2.f
>
> Ah, thanks for pointing this out.
>
>
>
> Thus, if you don't care about accuracy, then it is _possible_ that ddot would be faster, but i doubt it, and it is likely a bad idea to give up on some stability.
>
> Agreed.
Yes the Blas norm is often a good bit (much) slower than the Blas dot for the reason Jack points out. This is a very real measurable result using blas obtained from the Fortran reference that has not been optimized (by taking out the stability crap) (some of the Linux bundled blases) ; the blasnorm can give less than half the flop rate of the blas dot on real machines on real codes. On those same situations just writing a loop to do the norm is faster than calling the blas.
Now ideally configure would run both, get the timings and then only use the norm version if it is not significently slower than the dot version. But since Matt is the only person who can wrangle this stuff out of BuildSystem ......
I use to have a PETSC_BLAS_NORM_SLOW or something that allowed switching off the blas norm but that got lost over the years.
Given that this is a real problem (despite your skepticism) how do you suggest handling it? Just live with the crappy performance, have a bunch of #if defined() to switch based on configure flags, have Matt wrangle BuildSystem?
Barry
More information about the petsc-dev
mailing list