[petsc-dev] KNL MatMult performance and unrolling.
Jeff Hammond
jeff.science at gmail.com
Wed Sep 28 21:43:15 CDT 2016
If there is a minimal performance oriented test of this function, I can ask
the compiler team to study it w.r.t. unrolling heuristics.
Jeff
On Wednesday, September 28, 2016, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
> Mr Hong Zhang has found that removing the manual unrolling from
> MatMult_SeqAIJ_Inode() (at least with inode size 2) results in a good bump
> in performance on KNL and pointed me to the Intel gospel
> https://software.intel.com/en-us/articles/avoid-manual-loop-unrolling
> which we've always ignored in the past. It would be good try the unrolled
> and non-unrolled also on Xeon.
>
> We've never done a good job of managing our unrolling, where, how and
> when we do it and macros for unrolling such as PetscSparseDensePlusDot.
> Intel would say just throw it all away.
>
> Barry
>
>
>
>
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20160928/aecfa6e7/attachment.html>
More information about the petsc-dev
mailing list