[petsc-dev] What is this? "Optimize VecNorm_MPI. Use BLASdot_ instead of BLASnrm2_"
Jed Brown
jedbrown at mcs.anl.gov
Tue Jan 3 18:00:55 CST 2012
On Tue, Jan 3, 2012 at 17:48, Barry Smith <bsmith at mcs.anl.gov> wrote:
> Yes the Blas norm is often a good bit (much) slower than the Blas dot for
> the reason Jack points out. This is a very real measurable result using
> blas obtained from the Fortran reference that has not been optimized (by
> taking out the stability crap)
It seems silly to optimize for the reference BLAS. If the concern is just
this routine and just on x86-64, I would be inclined to write a simple
vectorized implementation (probably using SSE intrinsics) that still
includes the stability stuff.
Whatever the case, I'm not a fan of replacing nrm2() with dot().
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120103/b82351ae/attachment.html>
More information about the petsc-dev
mailing list