<div dir="ltr"><div><div>Yes, that's worth a try. Xiangdong, if you want to employ the MKL implementations for BAIJ MatMult() and friends, you can do so by configuring petsc-master with a recent version of MKL and then using the option "-mat_type baijmkl" (on the command line or set in your PETSC_OPTIONS environment variable).<br><br></div>Note that the above requires a version of MKL that is recent enough to have the sparse inspector-executor routines. MKL is now free, so I recommend installing the latest version.</div><div><br></div><div>(You can also try using the sparse MKL routines with AIJ format matrices by using either "-mat_type aijmkl" or "-mat_seqaij_type seqaijmkl". This will use MKL for MatMult()-type operations and some sparse matrix-matrix products.)<br></div><div><br></div><div>Best regards,</div><div>Richard<br></div><br><div><div><div><br><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 14, 2017 at 2:42 PM, Smith, Barry F. <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Use MKL versions of block formats?<br>
<div class="HOEnZb"><div class="h5"><br>
> On Nov 14, 2017, at 4:40 PM, Richard Tran Mills <<a href="mailto:rtmills@anl.gov">rtmills@anl.gov</a>> wrote:<br>
><br>
> On Tue, Nov 14, 2017 at 12:13 PM, Zhang, Hong <<a href="mailto:hongzhang@anl.gov">hongzhang@anl.gov</a>> wrote:<br>
><br>
><br>
>> On Nov 13, 2017, at 10:49 PM, Xiangdong <<a href="mailto:epscodes@gmail.com">epscodes@gmail.com</a>> wrote:<br>
>><br>
>> 1) How about the vectorization of BAIJ format?<br>
><br>
> BAIJ kernels are optimized with manual unrolling, but not with AVX intrinsics. So the vectorization relies on the compiler's ability.<br>
> It may or may not get vectorized depending on the compiler's optimization decisions. But vectorization is not essential for the performance of most BAIJ kernels.<br>
><br>
> I know that this has come up in previous discussions, but I'm guessing that the manual unrolling actually impedes the ability of many modern compilers to optimize the BAIJ calculations. I suppose we ought to have a switch to enable or disable the use of the unrolled versions? (And, further down the road, some sort of performance model to tell us what the setting for the switch should be...)<br>
><br>
> --Richard<br>
><br>
><br>
>> If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to do anything special (more than AVX flag) for the compiler to vectorize it?<br>
><br>
> In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal for AVX512. But other block sizes would make vectorization less profitable because of the remainders.<br>
><br>
>> 2) Could you please update the linear solver table to label the preconditioners/solvers compatible with ELL format?<br>
>> <a href="http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/<wbr>documentation/<wbr>linearsolvertable.html</a><br>
><br>
> This is still in a working progress. The easiest thing to do would be to use ELL for the Jacobian matrix and other formats (e.g. AIJ) for the preconditioners.<br>
> Then you would not need to worry about which preconditioners are compatible. An example can be found at ts/examples/tutorials/<wbr>advection-diffusion-reaction/<wbr>ex5adj.c.<br>
> For preconditioners such as block jacobi and mg (with bjacobi or with sor), you can use ELL for both the preconditioner and the Jacobian,<br>
> and expect a considerable gain since MatMult is the dominating operation.<br>
><br>
> The makefile for ex5adj includes a few use cases that demonstrate how ELL plays with various preconditioners.<br>
><br>
> Hong (Mr.)<br>
><br>
>> Thank you.<br>
>><br>
>> Xiangdong<br>
>><br>
>> On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong <<a href="mailto:hongzhang@anl.gov">hongzhang@anl.gov</a>> wrote:<br>
>> Most operations in PETSc would not benefit much from vectorization since they are memory-bounded. But this does not discourage you from compiling PETSc with AVX2/AVX512. We have added a new matrix format (currently named ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster than the AIJ format. The MatMult kernel is hand-optimized with AVX intrinsics. It works on any Intel processors that support AVX or AVX2 or AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we have been optimizing the AIJ MatMult kernel for these architectures as well. And one has to use AVX compiler flags in order to take advantage of the optimized kernels and the new matrix format.<br>
>><br>
>> Hong (Mr.)<br>
>><br>
>> > On Nov 12, 2017, at 10:35 PM, Xiangdong <<a href="mailto:epscodes@gmail.com">epscodes@gmail.com</a>> wrote:<br>
>> ><br>
>> > Hello everyone,<br>
>> ><br>
>> > Can someone comment on the vectorization of PETSc? For example, for the MatMult function, will it perform better or run faster if it is compiled with avx2 or avx512?<br>
>> ><br>
>> > Thank you.<br>
>> ><br>
>> > Best,<br>
>> > Xiangdong<br>
>><br>
>><br>
><br>
><br>
<br>
</div></div></blockquote></div><br></div></div>