<div dir="ltr">Yes, that should work, I coded it myself to workaround the issue, but you were faster than me. Thanks!</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 24 Oct 2019 at 22:00, Zhang, Hong <<a href="mailto:hongzhang@anl.gov">hongzhang@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">



<div style="overflow-wrap: break-word;">
<div>Hi Lisandro,</div>
<div><br>
</div>
<div>Can you please check if the following patch fixes the problem? I will create a MR.</div>
<div><br>
</div>
<div>diff --git a/src/mat/impls/aij/seq/aijperm/aijperm.c b/src/mat/impls/aij/seq/aijperm/aijperm.c</div>
<div>index 577dfc6713..568535117a 100644</div>
<div>--- a/src/mat/impls/aij/seq/aijperm/aijperm.c</div>
<div>+++ b/src/mat/impls/aij/seq/aijperm/aijperm.c</div>
<div>@@ -12,7 +12,7 @@</div>
<div><br>
</div>
<div> #include <../src/mat/impls/aij/seq/aij.h></div>
<div><br>
</div>
<div>-#if defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES)</div>
<div>+#if defined(PETSC_USE_AVX512_KERNELS) && defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES) && !defined(PETSC_SKIP_IMMINTRIN_H_CUDAWORKAROUND)</div>
<div> #include <immintrin.h></div>
<div><br>
</div>
<div> #if !defined(_MM_SCALE_8)</div>
<div>@@ -301,7 +301,7 @@ PetscErrorCode MatMult_SeqAIJPERM(Mat A,Vec xx,Vec yy)</div>
<div> #if !(defined(PETSC_USE_FORTRAN_KERNEL_MULTAIJPERM) && defined(notworking))</div>
<div>   PetscInt          i,j;</div>
<div> #endif</div>
<div>-#if defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES)</div>
<div>+#if defined(PETSC_USE_AVX512_KERNELS) && defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES) && !defined(PETSC_SKIP_IMMINTRIN_H_CUDAWORKAROUND)</div>
<div>   __m512d           vec_x,vec_y,vec_vals;</div>
<div>   __m256i           vec_idx,vec_ipos,vec_j;</div>
<div>   __mmask8           mask;</div>
<div>
<div>@@ -401,7 +401,7 @@ PetscErrorCode MatMult_SeqAIJPERM(Mat A,Vec xx,Vec yy)</div>
<div> #pragma _CRI prefervector</div>
<div> #endif</div>
<div><br>
</div>
<div>-#if defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES)</div>
<div>+#if defined(PETSC_USE_AVX512_KERNELS) && defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES) && !defined(PETSC_SKIP_IMMINTRIN_H_CUDAWORKAROUND)</div>
<div>             vec_y = _mm512_setzero_pd();</div>
<div>             ipos = ip[i];</div>
<div>             for (j=0; j<(nz>>3); j++) {</div>
<div>@@ -436,7 +436,7 @@ PetscErrorCode MatMult_SeqAIJPERM(Mat A,Vec xx,Vec yy)</div>
<div>            * worthwhile to vectorize across the rows, that is, to do the</div>
<div>            * matvec by operating with "columns" of the chunk. */</div>
<div>           for (j=0; j<nz; j++) {</div>
<div>-#if defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES)</div>
<div>+#if defined(PETSC_USE_AVX512_KERNELS) && defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES) && !defined(PETSC_SKIP_IMMINTRIN_H_CUDAWORKAROUND)</div>
<div>             vec_j = _mm256_set1_epi32(j);</div>
<div>             for (i=0; i<((isize>>3)<<3); i+=8) {</div>
<div>               vec_y    = _mm512_loadu_pd(&yp[i]);</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Hong</div>
<div><br>
<blockquote type="cite">
<div>On Oct 24, 2019, at 2:47 PM, Lisandro Dalcin via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" target="_blank">petsc-dev@mcs.anl.gov</a>> wrote:</div>
<br>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div>This is with master, but I bet the issue is also in maint.</div>
<div></div>
</div>
<div><br>
</div>
<div>* Running on Ubuntu 16</div>
<div><br>
</div>
$ uname -a<br>
Linux flamingo 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux<br>
<br>
* With system gcc 5.4 
<div><br>
<div>$ mpicc -show<br>
/usr/bin/gcc-5 -I/sw/workstations/apps/linux-ubuntu16.04-x86_64/mpich/3.3.1/gcc-5.4.0/nvejoe25snmak6a7fnjghabxjukjkuiu/include -L/sw/workstations/apps/linux-ubuntu16.04-x86_64/mpich/3.3.1/gcc-5.4.0/nvejoe25snmak6a7fnjghabxjukjkuiu/lib -Wl,-rpath -Wl,/sw/workstations/apps/linux-ubuntu16.04-x86_64/mpich/3.3.1/gcc-5.4.0/nvejoe25snmak6a7fnjghabxjukjkuiu/lib
 -lmpi<br>
<br>
</div>
<div>$ mpicc --version<br>
gcc-5 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609<br>
Copyright (C) 2015 Free Software Foundation, Inc.<br>
This is free software; see the source for copying conditions.  There is NO<br>
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br>
<div><br>
</div>
<div>* PETSc configured to NOT USE AVX512 kernels</div>
<div><br>
$ grep avx arch-gnu-opt/lib/petsc/conf/reconfigure-arch-gnu-opt.py <br>
    '--with-avx512-kernels=0',<br>
<br>
</div>
<div>* Bang!</div>
<div><br>
</div>
<div>$ touch src/mat/impls/aij/seq/aijperm/aijperm.c<br>
$ make -f gmakefile<br>
</div>
<div>Use "/usr/bin/make V=1" to see verbose compile lines, "/usr/bin/make V=0" to suppress.<br>
          CC arch-gnu-opt/obj/mat/impls/aij/seq/aijperm/aijperm.o<br>
/home/dalcin/Devel/petsc/src/mat/impls/aij/seq/aijperm/aijperm.c: In function ‘MatMult_SeqAIJPERM’:<br>
/home/dalcin/Devel/petsc/src/mat/impls/aij/seq/aijperm/aijperm.c:426:22: warning: implicit declaration of function ‘_mm512_reduce_add_pd’ [-Wimplicit-function-declaration]<br>
             yp[i] += _mm512_reduce_add_pd(vec_y);<br>
<div><br>
</div>
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>Lisandro Dalcin<br>
============<br>
Research Scientist<br>
Extreme Computing Research Center (ECRC)<br>
King Abdullah University of Science and Technology (KAUST)<br>
<a href="http://ecrc.kaust.edu.sa/" target="_blank">http://ecrc.kaust.edu.sa/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Lisandro Dalcin<br>============<br>Research Scientist<br>Extreme Computing Research Center (ECRC)<br>King Abdullah University of Science and Technology (KAUST)<br><a href="http://ecrc.kaust.edu.sa/" target="_blank">http://ecrc.kaust.edu.sa/</a><br></div></div></div>