[petsc-dev] KNL MatMult performance and unrolling.
bsmith at mcs.anl.gov
Wed Sep 28 15:40:44 CDT 2016
Mr Hong Zhang has found that removing the manual unrolling from MatMult_SeqAIJ_Inode() (at least with inode size 2) results in a good bump in performance on KNL and pointed me to the Intel gospel https://software.intel.com/en-us/articles/avoid-manual-loop-unrolling which we've always ignored in the past. It would be good try the unrolled and non-unrolled also on Xeon.
We've never done a good job of managing our unrolling, where, how and when we do it and macros for unrolling such as PetscSparseDensePlusDot. Intel would say just throw it all away.
More information about the petsc-dev