[petsc-users] questions about vectorization

Tue Nov 14 16:40:55 CST 2017

On Tue, Nov 14, 2017 at 12:13 PM, Zhang, Hong <hongzhang at anl.gov> wrote:

>
>
> On Nov 13, 2017, at 10:49 PM, Xiangdong <epscodes at gmail.com> wrote:
>
> 1) How about the vectorization of BAIJ format?
>
>
> BAIJ kernels are optimized with manual unrolling, but not with AVX
> intrinsics. So the vectorization relies on the compiler's ability.
> It may or may not get vectorized depending on the compiler's optimization
> decisions. But vectorization is not essential for the performance of most
> BAIJ kernels.
>

I know that this has come up in previous discussions, but I'm guessing that
the manual unrolling actually impedes the ability of many modern compilers
to optimize the BAIJ calculations. I suppose we ought to have a switch to
enable or disable the use of the unrolled versions? (And, further down the
road, some sort of performance model to tell us what the setting for the
switch should be...)

--Richard

> If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to do
> anything special (more than AVX flag) for the compiler to vectorize it?
>
>
> In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal
> for AVX512. But other block sizes would make vectorization less profitable
> because of the remainders.
>
> 2) Could you please update the linear solver table to label the
> preconditioners/solvers compatible with ELL format?
> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html
>
>
> This is still in a working progress. The easiest thing to do would be to
> use ELL for the Jacobian matrix and other formats (e.g. AIJ) for the
> preconditioners.
> Then you would not need to worry about which preconditioners are
> compatible. An example can be found at ts/examples/tutorials/
> advection-diffusion-reaction/ex5adj.c.
> For preconditioners such as block jacobi and mg (with bjacobi or with
> sor), you can use ELL for both the preconditioner and the Jacobian,
> and expect a considerable gain since MatMult is the dominating operation.
>
> The makefile for ex5adj includes a few use cases that demonstrate how ELL
> plays with various preconditioners.
>
> Hong (Mr.)
>
> Thank you.
>
> Xiangdong
>
> On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong <hongzhang at anl.gov> wrote:
>
>> Most operations in PETSc would not benefit much from vectorization since
>> they are memory-bounded. But this does not discourage you from compiling
>> PETSc with AVX2/AVX512. We have added a new matrix format (currently named
>> ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster
>> than the AIJ format. The MatMult kernel is hand-optimized with AVX
>> intrinsics. It works on any Intel processors that support AVX or AVX2 or
>> AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we
>> have been optimizing the AIJ MatMult kernel for these architectures as
>> well. And one has to use AVX compiler flags in order to take advantage of
>> the optimized kernels and the new matrix format.
>>
>> Hong (Mr.)
>>
>> > On Nov 12, 2017, at 10:35 PM, Xiangdong <epscodes at gmail.com> wrote:
>> >
>> > Hello everyone,
>> >
>> > Can someone comment on the vectorization of PETSc? For example, for the
>> MatMult function, will it perform better or run faster if it is compiled
>> with avx2 or avx512?
>> >
>> > Thank you.
>> >
>> > Best,
>> > Xiangdong
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171114/be14bc18/attachment.html>