[petsc-users] questions about vectorization

Tue Nov 14 14:13:03 CST 2017

On Nov 13, 2017, at 10:49 PM, Xiangdong <epscodes at gmail.com<mailto:epscodes at gmail.com>> wrote:

1) How about the vectorization of BAIJ format?

BAIJ kernels are optimized with manual unrolling, but not with AVX intrinsics. So the vectorization relies on the compiler's ability.
It may or may not get vectorized depending on the compiler's optimization decisions. But vectorization is not essential for the performance of most BAIJ kernels.

If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to do anything special (more than AVX flag) for the compiler to vectorize it?

In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal for AVX512. But other block sizes would make vectorization less profitable because of the remainders.

2) Could you please update the linear solver table to label the preconditioners/solvers compatible with ELL format?
http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html

This is still in a working progress. The easiest thing to do would be to use ELL for the Jacobian matrix and other formats (e.g. AIJ) for the preconditioners.
Then you would not need to worry about which preconditioners are compatible. An example can be found at ts/examples/tutorials/advection-diffusion-reaction/ex5adj.c.
For preconditioners such as block jacobi and mg (with bjacobi or with sor), you can use ELL for both the preconditioner and the Jacobian,
and expect a considerable gain since MatMult is the dominating operation.

The makefile for ex5adj includes a few use cases that demonstrate how ELL plays with various preconditioners.

Hong (Mr.)

Thank you.

Xiangdong

On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong <hongzhang at anl.gov<mailto:hongzhang at anl.gov>> wrote:
Most operations in PETSc would not benefit much from vectorization since they are memory-bounded. But this does not discourage you from compiling PETSc with AVX2/AVX512. We have added a new matrix format (currently named ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster than the AIJ format. The MatMult kernel is hand-optimized with AVX intrinsics. It works on any Intel processors that support AVX or AVX2 or AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we have been optimizing the AIJ MatMult kernel for these architectures as well. And one has to use AVX compiler flags in order to take advantage of the optimized kernels and the new matrix format.

Hong (Mr.)

> On Nov 12, 2017, at 10:35 PM, Xiangdong <epscodes at gmail.com<mailto:epscodes at gmail.com>> wrote:
>
> Hello everyone,
>
> Can someone comment on the vectorization of PETSc? For example, for the MatMult function, will it perform better or run faster if it is compiled with avx2 or avx512?
>
> Thank you.
>
> Best,
> Xiangdong

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171114/5fca306a/attachment.html>