[petsc-dev] KNL MatMult performance and unrolling.
Barry Smith
bsmith at mcs.anl.gov
Wed Sep 28 22:09:10 CDT 2016
Jeff,
This may be more a bug report with respect to PETSc then with respect to Intel compilers. If we see this in a variety of routines I'll send you some details.
Barry
> On Sep 28, 2016, at 9:43 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>
> If there is a minimal performance oriented test of this function, I can ask the compiler team to study it w.r.t. unrolling heuristics.
>
> Jeff
>
> On Wednesday, September 28, 2016, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
> Mr Hong Zhang has found that removing the manual unrolling from MatMult_SeqAIJ_Inode() (at least with inode size 2) results in a good bump in performance on KNL and pointed me to the Intel gospel https://software.intel.com/en-us/articles/avoid-manual-loop-unrolling which we've always ignored in the past. It would be good try the unrolled and non-unrolled also on Xeon.
>
> We've never done a good job of managing our unrolling, where, how and when we do it and macros for unrolling such as PetscSparseDensePlusDot. Intel would say just throw it all away.
>
> Barry
>
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
More information about the petsc-dev
mailing list