<div dir="ltr">I settled on this:<div><br></div><div>. /vol0004/apps/oss/spack/share/spack/setup-env.sh<br>spack load gcc@10.2.0%gcc@8.3.1 arch=linux-rhel8-a64fx<br>spack load fujitsu-mpi@4.5.0%gcc@8.3.1 arch=linux-rhel8-a64fx<br></div><div><br></div><div> 'CC=mpicc',<br> 'CXX=mpiCC',<br> 'FC=mpif90',<br> 'COPTFLAGS=-Ofast -march=armv8.2-a+sve -msve-vector-bits=512',<br> 'CXXOPTFLAGS=-Ofast -march=armv8.2-a+sve -msve-vector-bits=512',<br></div><div><br></div><div>Kokkos suggested this and I noticed that it helped:</div><div><br></div><div> export OMP_PROC_BIND=spread<br> export OMP_PLACES=threads<br></div><div><br></div><div>I am getting great thread scaling but I seem to get no vectorization. </div><div>I discussed this with Kokkos (Trott) today and he is not surprised. Auto vectorization is fragile.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 19, 2021 at 8:26 PM Sreepathi, Sarat <<a href="mailto:sarat@ornl.gov">sarat@ornl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div class="gmail-m_-5302868371812421590WordSection1">
<p class="MsoNormal">My turn: did you folks figure out tips for performant hybrid MPI+OMP core binding? I tried some from the documentation but that didn’t seem to help.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">-Sarat.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0in 0in">
<p class="MsoNormal"><b>From:</b> Sreepathi, Sarat <br>
<b>Sent:</b> Friday, April 16, 2021 3:02 PM<br>
<b>To:</b> Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>>; petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" target="_blank">petsc-dev@mcs.anl.gov</a>><br>
<b>Cc:</b> Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>><br>
<b>Subject:</b> RE: [petsc-dev] [EXTERNAL] Re: building on Fugaku<u></u><u></u></p>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">It’s 48 cores but there are 4 NUMA domains (CMGs). So, you may want to experiment in hybrid mode (4x12 etc.) if possible.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">-Sarat.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0in 0in">
<p class="MsoNormal"><b>From:</b> Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>>
<br>
<b>Sent:</b> Friday, April 16, 2021 2:10 PM<br>
<b>To:</b> petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" target="_blank">petsc-dev@mcs.anl.gov</a>><br>
<b>Cc:</b> Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>>; Sreepathi, Sarat <<a href="mailto:sarat@ornl.gov" target="_blank">sarat@ornl.gov</a>><br>
<b>Subject:</b> Re: [petsc-dev] [EXTERNAL] Re: building on Fugaku<u></u><u></u></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">Sarat, is there anything special that you do for Kokkos - OpenMP?<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Just set OMP_NUM_THREADS=48 ?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Also, I am confused about the number of cores here. Is 48 or 64 per node/socket?<u></u><u></u></p>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">On Fri, Apr 16, 2021 at 2:03 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0in 0in 0in 6pt;margin:5pt 0in 5pt 4.8pt">
<div>
<p class="MsoNormal">Cool, I have it running too. Need to add Sarat's flags and test ex2.<u></u><u></u></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">On Fri, Apr 16, 2021 at 1:57 PM Satish Balay via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" target="_blank">petsc-dev@mcs.anl.gov</a>> wrote:<u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0in 0in 0in 6pt;margin:5pt 0in 5pt 4.8pt">
<p class="MsoNormal">Mark,<br>
<br>
The following build works for me:<br>
<br>
Satish<br>
<br>
----<br>
<br>
pjsub --interact -L "node=1" -L "rscunit=rscunit_ft01" -L "elapse=1:00:00" --sparam "wait-time=1200"<br>
<br>
. /vol0004/apps/oss/spack/share/spack/setup-env.sh<br>
spack load fujitsu-mpi%gcc<br>
spack load <a href="mailto:gcc@10.2.0" target="_blank">gcc@10.2.0</a> arch=linux-rhel8-a64fx<br>
./configure COPTFLAGS='-Ofast -march=armv8.2-a+sve -msve-vector-bits=512' CXXOPTFLAGS='-Ofast -march=armv8.2-a+sve -msve-vector-bits=512' FOPTFLAGS='-Ofast -march=armv8.2-a+sve -msve-vector-bits=512' --with-openmp=1 --download-p4est --download-zlib --download-kokkos
--download-kokkos-kernels --download-kokkos-commit=origin/develop --download-kokkos-kernels-commit=origin/develop '--download-kokkos-cmake-arguments=-DBUILD_TESTING=OFF -DKokkos_ENABLE_LIBDL=OFF -DKokkos_ENABLE_AGGRESSIVE_VECTORIZATION=ON' --download-cmake=<a href="https://github.com/Kitware/CMake/releases/download/v3.20.1/cmake-3.20.1.tar.gz" target="_blank">https://github.com/Kitware/CMake/releases/download/v3.20.1/cmake-3.20.1.tar.gz</a>
--download-fblaslapack=1<br>
make PETSC_DIR=/vol0004/ra010009/a04201/petsc.z PETSC_ARCH=arch-linux-c-debug all<br>
<br>
<br>
To test - redo job allocation using max-proc-per-node:<br>
<br>
login6$ pjsub --interact -L "node=1" -L "rscunit=rscunit_ft01" -L "elapse=1:00:00" --sparam "wait-time=1200" --mpi "max-proc-per-node=16"<br>
<br>
[a04201@c31-3201c petsc.z]$ . /vol0004/apps/oss/spack/share/spack/setup-env.sh<br>
[a04201@c31-3201c petsc.z]$ spack load fujitsu-mpi%gcc<br>
[a04201@c31-3201c petsc.z]$ spack load <a href="mailto:gcc@10.2.0" target="_blank">gcc@10.2.0</a> arch=linux-rhel8-a64fx<br>
[a04201@c31-3201c petsc.z]$ make check<br>
Running check examples to verify correct installation<br>
Using PETSC_DIR=/vol0004/ra010009/a04201/petsc.z and PETSC_ARCH=arch-linux-c-debug<br>
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process<br>
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes<br>
C/C++ example src/snes/tutorials/ex3k run successfully with kokkos-kernels<br>
Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process<br>
Completed test examples<br>
[a04201@c31-3201c petsc.z]$ <u></u><u></u></p>
</blockquote>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote></div>