It is possible, though unlikely that the BLAS dot could be faster than the BLAS nrm2, though I am skeptical. The reason is that the result of dnrm2 on a vector u is more stable than the square root of the inner product of u with itself via ddot, as it scales the temporary products of the norm to make the computation more accurate:<br>
<a href="http://www.netlib.org/blas/dnrm2.f">http://www.netlib.org/blas/dnrm2.f</a><br><br>Thus, if you don't care about accuracy, then it is _possible_ that ddot would be faster, but i doubt it, and it is likely a bad idea to give up on some stability.<br>
<br>Jack<br><br><div class="gmail_quote">On Tue, Jan 3, 2012 at 4:33 PM, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<a href="http://petsc.cs.iit.edu/petsc/petsc-dev/rev/a8a483b98169" target="_blank">http://petsc.cs.iit.edu/petsc/petsc-dev/rev/a8a483b98169</a><div><br></div><div>This baffles me. I can think of no good reason for this, which gives me the impression that we are optimizing for an implementation quirk. If you have evidence that the performance of BLAS dot() is better than nrm2() across platforms and implementations, then we are witnessing a major implementation failure and people need to be shamed.</div>
<div><br></div><div>Aliasing is also *explicitly disallowed* by Fortran, so the result of</div><div><br></div><div>BLASdot_(&bn,xx,&one,xx,&one);</div><div><br></div><div>is not defined.</div>
</blockquote></div><br>