[petsc-dev] What is this? "Optimize VecNorm_MPI. Use BLASdot_ instead of BLASnrm2_"
    Jed Brown 
    jedbrown at mcs.anl.gov
       
    Wed Jan  4 13:40:58 CST 2012
    
    
  
On Tue, Jan 3, 2012 at 21:37, Barry Smith <bsmith at mcs.anl.gov> wrote:
> Come on, 95% of all Fortran users wouldn't even understand the above
> sentence.
Has anyone tried just unrolling the loop four times in C or Fortran, with a
separate "counter" for each stripe? The reference implementation will force
this to be totally sequential. All we have to do is hit the memory
bandwidth limit, which should be pretty easy. Did you have a stand-alone
benchmark or were you just measuring with -log_summary?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120104/24c03b47/attachment.html>
    
    
More information about the petsc-dev
mailing list