[petsc-users] questions about vectorization

Richard Tran Mills rtmills at anl.gov
Tue Nov 14 15:56:33 CST 2017


Xiangdong,

If you are running on an Intel-based system with support for recent
instruction sets like AVX2 or AVX-512, and you have access to the Intel
compilers, then telling the compiler to target these instruction sets
(e.g., "-xCORE-AVX2" or "-xMIC-AVX512") will probably give you some
noticeable gain in performance. It will be much less than you would expect
from something very CPU-bound like xGEMM code, but, in my experience, it
will be noticeable (remember, even if you have a memory-bound code, your
code's performance won't be bound by the memory subsystem 100% of the
time). I don't know how well the non-Intel compilers are able to
auto-vectorize, so your mileage may vary for those. As Hong has pointed
out, there are some places in the PETSc source in which we have introduced
code using AVX/AVX512 intrinsics. For those codes, you should see benefit
with any compiler that supports these intrinsics, as one is not relying on
the auto-vectorizer then.

Best regards,
Richard

On Mon, Nov 13, 2017 at 8:32 AM, Zhang, Hong <hongzhang at anl.gov> wrote:

> Most operations in PETSc would not benefit much from vectorization since
> they are memory-bounded. But this does not discourage you from compiling
> PETSc with AVX2/AVX512. We have added a new matrix format (currently named
> ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster
> than the AIJ format. The MatMult kernel is hand-optimized with AVX
> intrinsics. It works on any Intel processors that support AVX or AVX2 or
> AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we
> have been optimizing the AIJ MatMult kernel for these architectures as
> well. And one has to use AVX compiler flags in order to take advantage of
> the optimized kernels and the new matrix format.
>
> Hong (Mr.)
>
> > On Nov 12, 2017, at 10:35 PM, Xiangdong <epscodes at gmail.com> wrote:
> >
> > Hello everyone,
> >
> > Can someone comment on the vectorization of PETSc? For example, for the
> MatMult function, will it perform better or run faster if it is compiled
> with avx2 or avx512?
> >
> > Thank you.
> >
> > Best,
> > Xiangdong
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171114/63596ddf/attachment.html>


More information about the petsc-users mailing list