[petsc-dev] What is this? "Optimize VecNorm_MPI. Use BLASdot_ instead of BLASnrm2_"
Jed Brown
jedbrown at mcs.anl.gov
Wed Jan 4 13:40:58 CST 2012
On Tue, Jan 3, 2012 at 21:37, Barry Smith <bsmith at mcs.anl.gov> wrote:
> Come on, 95% of all Fortran users wouldn't even understand the above
> sentence.
Has anyone tried just unrolling the loop four times in C or Fortran, with a
separate "counter" for each stripe? The reference implementation will force
this to be totally sequential. All we have to do is hit the memory
bandwidth limit, which should be pretty easy. Did you have a stand-alone
benchmark or were you just measuring with -log_summary?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120104/24c03b47/attachment.html>
More information about the petsc-dev
mailing list