[petsc-dev] [EXTERNAL] Re: building on Fugaku

Mark Adams mfadams at lbl.gov
Tue Apr 20 19:14:12 CDT 2021


I settled on this:

. /vol0004/apps/oss/spack/share/spack/setup-env.sh
spack load gcc at 10.2.0%gcc at 8.3.1 arch=linux-rhel8-a64fx
spack load fujitsu-mpi at 4.5.0%gcc at 8.3.1 arch=linux-rhel8-a64fx

    'CC=mpicc',
    'CXX=mpiCC',
    'FC=mpif90',
    'COPTFLAGS=-Ofast -march=armv8.2-a+sve -msve-vector-bits=512',
    'CXXOPTFLAGS=-Ofast -march=armv8.2-a+sve -msve-vector-bits=512',

Kokkos suggested this and I noticed that it helped:

        export OMP_PROC_BIND=spread
        export OMP_PLACES=threads

I am getting great thread scaling but I seem to get no vectorization.
I discussed this with Kokkos (Trott) today and he is not surprised. Auto
vectorization is fragile.


On Mon, Apr 19, 2021 at 8:26 PM Sreepathi, Sarat <sarat at ornl.gov> wrote:

> My turn: did you folks figure out tips for performant hybrid MPI+OMP core
> binding? I tried some from the documentation but that didn’t seem to help.
>
>
>
> -Sarat.
>
>
>
> *From:* Sreepathi, Sarat
> *Sent:* Friday, April 16, 2021 3:02 PM
> *To:* Mark Adams <mfadams at lbl.gov>; petsc-dev <petsc-dev at mcs.anl.gov>
> *Cc:* Satish Balay <balay at mcs.anl.gov>
> *Subject:* RE: [petsc-dev] [EXTERNAL] Re: building on Fugaku
>
>
>
> It’s 48 cores but there are 4 NUMA domains (CMGs). So, you may want to
> experiment in hybrid mode (4x12 etc.) if possible.
>
>
>
> -Sarat.
>
>
>
> *From:* Mark Adams <mfadams at lbl.gov>
> *Sent:* Friday, April 16, 2021 2:10 PM
> *To:* petsc-dev <petsc-dev at mcs.anl.gov>
> *Cc:* Satish Balay <balay at mcs.anl.gov>; Sreepathi, Sarat <sarat at ornl.gov>
> *Subject:* Re: [petsc-dev] [EXTERNAL] Re: building on Fugaku
>
>
>
> Sarat, is there anything special that you do for Kokkos - OpenMP?
>
>
>
> Just set OMP_NUM_THREADS=48 ?
>
>
>
> Also, I am confused about the number of cores here. Is 48 or 64 per
> node/socket?
>
>
>
> On Fri, Apr 16, 2021 at 2:03 PM Mark Adams <mfadams at lbl.gov> wrote:
>
> Cool, I have it running too. Need to add Sarat's flags and test ex2.
>
>
>
> On Fri, Apr 16, 2021 at 1:57 PM Satish Balay via petsc-dev <
> petsc-dev at mcs.anl.gov> wrote:
>
> Mark,
>
> The following build works for me:
>
> Satish
>
> ----
>
> pjsub --interact -L "node=1" -L "rscunit=rscunit_ft01" -L "elapse=1:00:00"
> --sparam "wait-time=1200"
>
> . /vol0004/apps/oss/spack/share/spack/setup-env.sh
> spack load fujitsu-mpi%gcc
> spack load gcc at 10.2.0 arch=linux-rhel8-a64fx
> ./configure COPTFLAGS='-Ofast -march=armv8.2-a+sve -msve-vector-bits=512'
> CXXOPTFLAGS='-Ofast -march=armv8.2-a+sve -msve-vector-bits=512'
> FOPTFLAGS='-Ofast -march=armv8.2-a+sve -msve-vector-bits=512'
> --with-openmp=1 --download-p4est --download-zlib --download-kokkos
> --download-kokkos-kernels --download-kokkos-commit=origin/develop
> --download-kokkos-kernels-commit=origin/develop
> '--download-kokkos-cmake-arguments=-DBUILD_TESTING=OFF
> -DKokkos_ENABLE_LIBDL=OFF -DKokkos_ENABLE_AGGRESSIVE_VECTORIZATION=ON'
> --download-cmake=
> https://github.com/Kitware/CMake/releases/download/v3.20.1/cmake-3.20.1.tar.gz
> --download-fblaslapack=1
> make PETSC_DIR=/vol0004/ra010009/a04201/petsc.z
> PETSC_ARCH=arch-linux-c-debug all
>
>
> To test - redo job allocation using max-proc-per-node:
>
> login6$ pjsub --interact -L "node=1" -L "rscunit=rscunit_ft01" -L
> "elapse=1:00:00" --sparam "wait-time=1200" --mpi "max-proc-per-node=16"
>
> [a04201 at c31-3201c petsc.z]$ .
> /vol0004/apps/oss/spack/share/spack/setup-env.sh
> [a04201 at c31-3201c petsc.z]$ spack load fujitsu-mpi%gcc
> [a04201 at c31-3201c petsc.z]$ spack load gcc at 10.2.0 arch=linux-rhel8-a64fx
> [a04201 at c31-3201c petsc.z]$ make check
> Running check examples to verify correct installation
> Using PETSC_DIR=/vol0004/ra010009/a04201/petsc.z and
> PETSC_ARCH=arch-linux-c-debug
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> C/C++ example src/snes/tutorials/ex3k run successfully with kokkos-kernels
> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
> Completed test examples
> [a04201 at c31-3201c petsc.z]$
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210420/622fec2e/attachment.html>


More information about the petsc-dev mailing list