[petsc-users] questions about vectorization

Tue Nov 14 16:59:50 CST 2017

Yes, that's worth a try. Xiangdong, if you want to employ the MKL
implementations for BAIJ MatMult() and friends, you can do so by
configuring petsc-master with a recent version of MKL and then using the
option "-mat_type baijmkl" (on the command line or set in your
PETSC_OPTIONS environment variable).

Note that the above requires a version of MKL that is recent enough to have
the sparse inspector-executor routines. MKL is now free, so I recommend
installing the latest version.

(You can also try using the sparse MKL routines with AIJ format matrices by
using either "-mat_type aijmkl" or "-mat_seqaij_type seqaijmkl". This will
use MKL for MatMult()-type operations and some sparse matrix-matrix
products.)

Best regards,
Richard

On Tue, Nov 14, 2017 at 2:42 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   Use MKL versions of block formats?
>
> > On Nov 14, 2017, at 4:40 PM, Richard Tran Mills <rtmills at anl.gov> wrote:
> >
> > On Tue, Nov 14, 2017 at 12:13 PM, Zhang, Hong <hongzhang at anl.gov> wrote:
> >
> >
> >> On Nov 13, 2017, at 10:49 PM, Xiangdong <epscodes at gmail.com> wrote:
> >>
> >> 1) How about the vectorization of BAIJ format?
> >
> > BAIJ kernels are optimized with manual unrolling, but not with AVX
> intrinsics. So the vectorization relies on the compiler's ability.
> > It may or may not get vectorized depending on the compiler's
> optimization decisions. But vectorization is not essential for the
> performance of most BAIJ kernels.
> >
> > I know that this has come up in previous discussions, but I'm guessing
> that the manual unrolling actually impedes the ability of many modern
> compilers to optimize the BAIJ calculations. I suppose we ought to have a
> switch to enable or disable the use of the unrolled versions? (And, further
> down the road, some sort of performance model to tell us what the setting
> for the switch should be...)
> >
> > --Richard
> >
> >
> >> If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to
> do anything special (more than AVX flag) for the compiler to vectorize it?
> >
> > In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal
> for AVX512. But other block sizes would make vectorization less profitable
> because of the remainders.
> >
> >> 2) Could you please update the linear solver table to label the
> preconditioners/solvers compatible with ELL format?
> >> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html
> >
> > This is still in a working progress. The easiest thing to do would be to
> use ELL for the Jacobian matrix and other formats (e.g. AIJ) for the
> preconditioners.
> > Then you would not need to worry about which preconditioners are
> compatible. An example can be found at ts/examples/tutorials/
> advection-diffusion-reaction/ex5adj.c.
> > For preconditioners such as block jacobi and mg (with bjacobi or with
> sor), you can use ELL for both the preconditioner and the Jacobian,
> > and expect a considerable gain since MatMult is the dominating operation.
> >
> > The makefile for ex5adj includes a few use cases that demonstrate how
> ELL plays with various preconditioners.
> >
> > Hong (Mr.)
> >
> >> Thank you.
> >>
> >> Xiangdong
> >>
> >> On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong <hongzhang at anl.gov>
> wrote:
> >> Most operations in PETSc would not benefit much from vectorization
> since they are memory-bounded. But this does not discourage you from
> compiling PETSc with AVX2/AVX512. We have added a new matrix format
> (currently named ELL, but will be changed to SELL shortly) that can make
> MatMult ~2X faster than the AIJ format. The MatMult kernel is
> hand-optimized with AVX intrinsics. It works on any Intel processors that
> support AVX or AVX2 or AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake.
> On the other hand, we have been optimizing the AIJ MatMult kernel for these
> architectures as well. And one has to use AVX compiler flags in order to
> take advantage of the optimized kernels and the new matrix format.
> >>
> >> Hong (Mr.)
> >>
> >> > On Nov 12, 2017, at 10:35 PM, Xiangdong <epscodes at gmail.com> wrote:
> >> >
> >> > Hello everyone,
> >> >
> >> > Can someone comment on the vectorization of PETSc? For example, for
> the MatMult function, will it perform better or run faster if it is
> compiled with avx2 or avx512?
> >> >
> >> > Thank you.
> >> >
> >> > Best,
> >> > Xiangdong
> >>
> >>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171114/ff4ca337/attachment.html>