<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div class="">Hi Lisandro,</div>
<div class=""><br class="">
</div>
<div class="">Can you please check if the following patch fixes the problem? I will create a MR.</div>
<div class=""><br class="">
</div>
<div class="">diff --git a/src/mat/impls/aij/seq/aijperm/aijperm.c b/src/mat/impls/aij/seq/aijperm/aijperm.c</div>
<div class="">index 577dfc6713..568535117a 100644</div>
<div class="">--- a/src/mat/impls/aij/seq/aijperm/aijperm.c</div>
<div class="">+++ b/src/mat/impls/aij/seq/aijperm/aijperm.c</div>
<div class="">@@ -12,7 +12,7 @@</div>
<div class=""><br class="">
</div>
<div class=""> #include <../src/mat/impls/aij/seq/aij.h></div>
<div class=""><br class="">
</div>
<div class="">-#if defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES)</div>
<div class="">+#if defined(PETSC_USE_AVX512_KERNELS) && defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES) && !defined(PETSC_SKIP_IMMINTRIN_H_CUDAWORKAROUND)</div>
<div class=""> #include <immintrin.h></div>
<div class=""><br class="">
</div>
<div class=""> #if !defined(_MM_SCALE_8)</div>
<div class="">@@ -301,7 +301,7 @@ PetscErrorCode MatMult_SeqAIJPERM(Mat A,Vec xx,Vec yy)</div>
<div class=""> #if !(defined(PETSC_USE_FORTRAN_KERNEL_MULTAIJPERM) && defined(notworking))</div>
<div class="">   PetscInt          i,j;</div>
<div class=""> #endif</div>
<div class="">-#if defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES)</div>
<div class="">+#if defined(PETSC_USE_AVX512_KERNELS) && defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES) && !defined(PETSC_SKIP_IMMINTRIN_H_CUDAWORKAROUND)</div>
<div class="">   __m512d           vec_x,vec_y,vec_vals;</div>
<div class="">   __m256i           vec_idx,vec_ipos,vec_j;</div>
<div class="">   __mmask8           mask;</div>
<div class="">
<div class="">@@ -401,7 +401,7 @@ PetscErrorCode MatMult_SeqAIJPERM(Mat A,Vec xx,Vec yy)</div>
<div class=""> #pragma _CRI prefervector</div>
<div class=""> #endif</div>
<div class=""><br class="">
</div>
<div class="">-#if defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES)</div>
<div class="">+#if defined(PETSC_USE_AVX512_KERNELS) && defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES) && !defined(PETSC_SKIP_IMMINTRIN_H_CUDAWORKAROUND)</div>
<div class="">             vec_y = _mm512_setzero_pd();</div>
<div class="">             ipos = ip[i];</div>
<div class="">             for (j=0; j<(nz>>3); j++) {</div>
<div class="">@@ -436,7 +436,7 @@ PetscErrorCode MatMult_SeqAIJPERM(Mat A,Vec xx,Vec yy)</div>
<div class="">            * worthwhile to vectorize across the rows, that is, to do the</div>
<div class="">            * matvec by operating with "columns" of the chunk. */</div>
<div class="">           for (j=0; j<nz; j++) {</div>
<div class="">-#if defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES)</div>
<div class="">+#if defined(PETSC_USE_AVX512_KERNELS) && defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES) && !defined(PETSC_SKIP_IMMINTRIN_H_CUDAWORKAROUND)</div>
<div class="">             vec_j = _mm256_set1_epi32(j);</div>
<div class="">             for (i=0; i<((isize>>3)<<3); i+=8) {</div>
<div class="">               vec_y    = _mm512_loadu_pd(&yp[i]);</div>
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Hong</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Oct 24, 2019, at 2:47 PM, Lisandro Dalcin via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" class="">petsc-dev@mcs.anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">
<div dir="ltr" class="">
<div class="">
<div class="">This is with master, but I bet the issue is also in maint.</div>
<div class=""></div>
</div>
<div class=""><br class="">
</div>
<div class="">* Running on Ubuntu 16</div>
<div class=""><br class="">
</div>
$ uname -a<br class="">
Linux flamingo 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux<br class="">
<br class="">
* With system gcc 5.4 
<div class=""><br class="">
<div class="">$ mpicc -show<br class="">
/usr/bin/gcc-5 -I/sw/workstations/apps/linux-ubuntu16.04-x86_64/mpich/3.3.1/gcc-5.4.0/nvejoe25snmak6a7fnjghabxjukjkuiu/include -L/sw/workstations/apps/linux-ubuntu16.04-x86_64/mpich/3.3.1/gcc-5.4.0/nvejoe25snmak6a7fnjghabxjukjkuiu/lib -Wl,-rpath -Wl,/sw/workstations/apps/linux-ubuntu16.04-x86_64/mpich/3.3.1/gcc-5.4.0/nvejoe25snmak6a7fnjghabxjukjkuiu/lib
 -lmpi<br class="">
<br class="">
</div>
<div class="">$ mpicc --version<br class="">
gcc-5 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609<br class="">
Copyright (C) 2015 Free Software Foundation, Inc.<br class="">
This is free software; see the source for copying conditions.  There is NO<br class="">
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br class="">
<div class=""><br class="">
</div>
<div class="">* PETSc configured to NOT USE AVX512 kernels</div>
<div class=""><br class="">
$ grep avx arch-gnu-opt/lib/petsc/conf/reconfigure-arch-gnu-opt.py <br class="">
    '--with-avx512-kernels=0',<br class="">
<br class="">
</div>
<div class="">* Bang!</div>
<div class=""><br class="">
</div>
<div class="">$ touch src/mat/impls/aij/seq/aijperm/aijperm.c<br class="">
$ make -f gmakefile<br class="">
</div>
<div class="">Use "/usr/bin/make V=1" to see verbose compile lines, "/usr/bin/make V=0" to suppress.<br class="">
          CC arch-gnu-opt/obj/mat/impls/aij/seq/aijperm/aijperm.o<br class="">
/home/dalcin/Devel/petsc/src/mat/impls/aij/seq/aijperm/aijperm.c: In function ‘MatMult_SeqAIJPERM’:<br class="">
/home/dalcin/Devel/petsc/src/mat/impls/aij/seq/aijperm/aijperm.c:426:22: warning: implicit declaration of function ‘_mm512_reduce_add_pd’ [-Wimplicit-function-declaration]<br class="">
             yp[i] += _mm512_reduce_add_pd(vec_y);<br class="">
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
-- <br class="">
<div dir="ltr" class="gmail_signature">
<div dir="ltr" class="">
<div class="">Lisandro Dalcin<br class="">
============<br class="">
Research Scientist<br class="">
Extreme Computing Research Center (ECRC)<br class="">
King Abdullah University of Science and Technology (KAUST)<br class="">
<a href="http://ecrc.kaust.edu.sa/" target="_blank" class="">http://ecrc.kaust.edu.sa/</a><br class="">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</body>
</html>