<div dir="ltr"><div><div>Xiangdong,<br><br></div>If you are running on an Intel-based system with support for recent instruction sets like AVX2 or AVX-512, and you have access to the Intel compilers, then telling the compiler to target these instruction sets (e.g., "-xCORE-AVX2" or "-xMIC-AVX512") will probably give you some noticeable gain in performance. It will be much less than you would expect from something very CPU-bound like xGEMM code, but, in my experience, it will be noticeable (remember, even if you have a memory-bound code, your code's performance won't be bound by the memory subsystem 100% of the time). I don't know how well the non-Intel compilers are able to auto-vectorize, so your mileage may vary for those. As Hong has pointed out, there are some places in the PETSc source in which we have introduced code using AVX/AVX512 intrinsics. For those codes, you should see benefit with any compiler that supports these intrinsics, as one is not relying on the auto-vectorizer then.<br></div><div><br></div><div>Best regards,<br></div>Richard<br><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Nov 13, 2017 at 8:32 AM, Zhang, Hong <span dir="ltr"><<a href="mailto:hongzhang@anl.gov" target="_blank">hongzhang@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Most operations in PETSc would not benefit much from vectorization since they are memory-bounded. But this does not discourage you from compiling PETSc with AVX2/AVX512. We have added a new matrix format (currently named ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster than the AIJ format. The MatMult kernel is hand-optimized with AVX intrinsics. It works on any Intel processors that support AVX or AVX2 or AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we have been optimizing the AIJ MatMult kernel for these architectures as well. And one has to use AVX compiler flags in order to take advantage of the optimized kernels and the new matrix format.<br>

<br>

Hong (Mr.)<br>

<span class=""><br>

> On Nov 12, 2017, at 10:35 PM, Xiangdong <<a href="mailto:epscodes@gmail.com">epscodes@gmail.com</a>> wrote:<br>

><br>

> Hello everyone,<br>

><br>

> Can someone comment on the vectorization of PETSc? For example, for the MatMult function, will it perform better or run faster if it is compiled with avx2 or avx512?<br>

><br>

</span>> Thank you.<br>

><br>

> Best,<br>

> Xiangdong<br>

<br>

</blockquote></div><br></div></div></div></div>