[petsc-dev] [EXTERNAL] Re: building on Fugaku

Sreepathi, Sarat sarat at ornl.gov
Tue Apr 20 20:06:30 CDT 2021

Already tried those but it didn't help. I have been trying to experiment with 48x1, 24x2 etc. and performance degraded for the climate workload.

From: Mark Adams <mfadams at lbl.gov>
Sent: Tuesday, April 20, 2021 8:14:12 PM
To: Sreepathi, Sarat <sarat at ornl.gov>
Cc: petsc-dev <petsc-dev at mcs.anl.gov>; Satish Balay <balay at mcs.anl.gov>
Subject: Re: [petsc-dev] [EXTERNAL] Re: building on Fugaku

I settled on this:

. /vol0004/apps/oss/spack/share/spack/setup-env.sh
spack load gcc at 10.2.0%gcc at 8.3.1 arch=linux-rhel8-a64fx
spack load fujitsu-mpi at 4.5.0%gcc at 8.3.1 arch=linux-rhel8-a64fx

    'COPTFLAGS=-Ofast -march=armv8.2-a+sve -msve-vector-bits=512',
    'CXXOPTFLAGS=-Ofast -march=armv8.2-a+sve -msve-vector-bits=512',

Kokkos suggested this and I noticed that it helped:

        export OMP_PROC_BIND=spread
        export OMP_PLACES=threads

I am getting great thread scaling but I seem to get no vectorization.
I discussed this with Kokkos (Trott) today and he is not surprised. Auto vectorization is fragile.

On Mon, Apr 19, 2021 at 8:26 PM Sreepathi, Sarat <sarat at ornl.gov<mailto:sarat at ornl.gov>> wrote:

My turn: did you folks figure out tips for performant hybrid MPI+OMP core binding? I tried some from the documentation but that didn’t seem to help.


From: Sreepathi, Sarat
Sent: Friday, April 16, 2021 3:02 PM
To: Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>>; petsc-dev <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>>
Cc: Satish Balay <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>
Subject: RE: [petsc-dev] [EXTERNAL] Re: building on Fugaku

It’s 48 cores but there are 4 NUMA domains (CMGs). So, you may want to experiment in hybrid mode (4x12 etc.) if possible.


From: Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>>
Sent: Friday, April 16, 2021 2:10 PM
To: petsc-dev <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>>
Cc: Satish Balay <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>; Sreepathi, Sarat <sarat at ornl.gov<mailto:sarat at ornl.gov>>
Subject: Re: [petsc-dev] [EXTERNAL] Re: building on Fugaku

Sarat, is there anything special that you do for Kokkos - OpenMP?

Just set OMP_NUM_THREADS=48 ?

Also, I am confused about the number of cores here. Is 48 or 64 per node/socket?

On Fri, Apr 16, 2021 at 2:03 PM Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>> wrote:

Cool, I have it running too. Need to add Sarat's flags and test ex2.

On Fri, Apr 16, 2021 at 1:57 PM Satish Balay via petsc-dev <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>> wrote:


The following build works for me:



pjsub --interact -L "node=1" -L "rscunit=rscunit_ft01" -L "elapse=1:00:00" --sparam "wait-time=1200"

. /vol0004/apps/oss/spack/share/spack/setup-env.sh
spack load fujitsu-mpi%gcc
spack load gcc at 10.2.0<mailto:gcc at 10.2.0> arch=linux-rhel8-a64fx
./configure COPTFLAGS='-Ofast -march=armv8.2-a+sve -msve-vector-bits=512' CXXOPTFLAGS='-Ofast -march=armv8.2-a+sve -msve-vector-bits=512' FOPTFLAGS='-Ofast -march=armv8.2-a+sve -msve-vector-bits=512' --with-openmp=1 --download-p4est --download-zlib --download-kokkos --download-kokkos-kernels --download-kokkos-commit=origin/develop --download-kokkos-kernels-commit=origin/develop '--download-kokkos-cmake-arguments=-DBUILD_TESTING=OFF -DKokkos_ENABLE_LIBDL=OFF -DKokkos_ENABLE_AGGRESSIVE_VECTORIZATION=ON' --download-cmake=https://github.com/Kitware/CMake/releases/download/v3.20.1/cmake-3.20.1.tar.gz  --download-fblaslapack=1
make PETSC_DIR=/vol0004/ra010009/a04201/petsc.z PETSC_ARCH=arch-linux-c-debug all

To test - redo job allocation using max-proc-per-node:

login6$ pjsub --interact -L "node=1" -L "rscunit=rscunit_ft01" -L "elapse=1:00:00" --sparam "wait-time=1200" --mpi "max-proc-per-node=16"

[a04201 at c31-3201c petsc.z]$ . /vol0004/apps/oss/spack/share/spack/setup-env.sh
[a04201 at c31-3201c petsc.z]$ spack load fujitsu-mpi%gcc
[a04201 at c31-3201c petsc.z]$ spack load gcc at 10.2.0<mailto:gcc at 10.2.0> arch=linux-rhel8-a64fx
[a04201 at c31-3201c petsc.z]$ make check
Running check examples to verify correct installation
Using PETSC_DIR=/vol0004/ra010009/a04201/petsc.z and PETSC_ARCH=arch-linux-c-debug
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
C/C++ example src/snes/tutorials/ex3k run successfully with kokkos-kernels
Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
Completed test examples
[a04201 at c31-3201c petsc.z]$
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210421/9de356b0/attachment-0001.html>

More information about the petsc-dev mailing list