[petsc-dev] What is this? "Optimize VecNorm_MPI. Use BLASdot_ instead of BLASnrm2_"
Barry Smith
bsmith at mcs.anl.gov
Tue Jan 3 18:09:26 CST 2012
On Jan 3, 2012, at 6:00 PM, Jed Brown wrote:
> On Tue, Jan 3, 2012 at 17:48, Barry Smith <bsmith at mcs.anl.gov> wrote:
> Yes the Blas norm is often a good bit (much) slower than the Blas dot for the reason Jack points out. This is a very real measurable result using blas obtained from the Fortran reference that has not been optimized (by taking out the stability crap)
>
> It seems silly to optimize for the reference BLAS.
It is not just the reference BLAS. It is the -lblas that come on many Linux systems by default (that are not much more than compiled versions of the reference blas).
Now you can say that you don't care about that situation, and those blas are stupid but it is a common situation and saying that is stupid doesn't help all those users who spend way to much time on norm.
> If the concern is just this routine and just on x86-64, I would be inclined to write a simple vectorized implementation (probably using SSE intrinsics) that still includes the stability stuff.
>
I don't think the stability stuff is needed for how norm() is used in PETSc (if it is important how come it is not important for the dot products also?). It is just there for pathological matrices the LINPACK guys knew about; I consider it just a fetish that got the LINPACK guys excited.
> Whatever the case, I'm not a fan of replacing nrm2() with dot().
Why not? If the dot is highly optimized it may be faster than your own hand coded blas thing.
So you are saying we need Matt to write another BuildSystem test?
Barry
More information about the petsc-dev
mailing list