[petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

Mark Adams mfadams at lbl.gov
Tue Mar 5 13:41:45 CST 2024


You can run with -log_view_gpu_time to get rid of the nans and get more
data.

You can run with -ksp_view to get more info on the solver and send that
output.

-options_left is also good to use so we can see what parameters you used.

The last 100 in this row:

KSPSolve            1197 0.0 2.0291e+02 0.0 2.55e+11 0.0 3.9e+04 8.0e+04
3.1e+04 12 100 100 100 49  12 100 100 100 98  2503    -nan      0 1.80e-05
   0 0.00e+00  100

tells us that all the flops were logged on GPUs.

You do need at least 100K equations per GPU to see speedup, so don't worry
about small problems.

Mark




On Tue, Mar 5, 2024 at 12:52 PM Vanella, Marcos (Fed) via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos
> and hip options: ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3"
> FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
> Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos
> and hip options:
>
> ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3"
> FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc
> --with-cxx=CC --with-fc=ftn --with-hip --with-hipc=hipcc
> --LIBS="-L${MPICH_DIR}/lib -lmpi ${PE_MPICH_GTL_DIR_amd_gfx90a}
> ${PE_MPICH_GTL_LIBS_amd_gfx90a}" --download-kokkos
> --download-kokkos-kernels --download-suitesparse --download-hypre
> --download-cmake
>
> and have started testing our code solving a Poisson linear system with CG
> + HYPRE preconditioner. Timings look rather high compared to compilations
> done on other machines that have NVIDIA cards. They are also not changing
> when using more than one GPU for the simple test I doing.
> Does anyone happen to know if HYPRE has an hip GPU implementation for
> Boomer AMG and is it compiled when configuring PETSc?
>
> Thanks!
>
> Marcos
>
>
> PS: This is what I see on the log file (-log_view) when running the case
> with 2 GPUs in the node:
>
>
> ------------------------------------------------------------------ PETSc
> Performance Summary:
> ------------------------------------------------------------------
>
> /ccs/home/vanellam/Firemodels_fork/fds/Build/mpich_gnu_frontier/fds_mpich_gnu_frontier
> on a arch-linux-frontier-opt-gcc named frontier04119 with 4 processors, by
> vanellam Tue Mar  5 12:42:29 2024
> Using Petsc Development GIT revision: v3.20.5-713-gabdf6bc0fcf  GIT Date:
> 2024-03-05 01:04:54 +0000
>
>                          Max       Max/Min     Avg       Total
> Time (sec):           8.368e+02     1.000   8.368e+02
> Objects:              0.000e+00     0.000   0.000e+00
> Flops:                2.546e+11     0.000   1.270e+11  5.079e+11
> Flops/sec:            3.043e+08     0.000   1.518e+08  6.070e+08
> MPI Msg Count:        1.950e+04     0.000   9.748e+03  3.899e+04
> MPI Msg Len (bytes):  1.560e+09     0.000   7.999e+04  3.119e+09
> MPI Reductions:       6.331e+04   2877.545
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total    Count
> %Total     Avg         %Total    Count   %Total
>  0:      Main Stage: 8.3676e+02 100.0%  5.0792e+11 100.0%  3.899e+04
> 100.0%  7.999e+04      100.0%  3.164e+04  50.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flop: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    AvgLen: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flop in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
> all processors)
>    GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU
> time over all processors)
>    CpuToGpu Count: total number of CPU to GPU copies per processor
>    CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
> processor)
>    GpuToCpu Count: total number of GPU to CPU copies per processor
>    GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
> processor)
>    GPU %F: percent flops on GPU in this event
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flop
>        --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   -
> GpuToCpu - GPU
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
>  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size
> Count   Size  %F
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> BuildTwoSided       1201 0.0   nan nan 0.00e+00 0.0 2.0e+00 4.0e+00
> 6.0e+02  0  0  0  0  1   0  0  0  0  2  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> BuildTwoSidedF      1200 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+02  0  0  0  0  1   0  0  0  0  2  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> MatMult            19494 0.0   nan nan 1.35e+11 0.0 3.9e+04 8.0e+04
> 0.0e+00  7 53 100 100  0   7 53 100 100  0  -nan    -nan      0 1.80e-05
>  0 0.00e+00  100
> MatConvert             3 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.5e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> MatAssemblyBegin       2 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> MatAssemblyEnd         2 0.0   nan nan 0.00e+00 0.0 4.0e+00 2.0e+04
> 3.5e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> VecTDot            41382 0.0   nan nan 4.14e+10 0.0 0.0e+00 0.0e+00
> 2.1e+04  0 16  0  0 33   0 16  0  0 65  -nan    -nan      0 0.00e+00    0
> 0.00e+00  100
> VecNorm            20691 0.0   nan nan 2.07e+10 0.0 0.0e+00 0.0e+00
> 1.0e+04  0  8  0  0 16   0  8  0  0 33  -nan    -nan      0 0.00e+00    0
> 0.00e+00  100
> VecCopy             2394 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> VecSet             21888 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> VecAXPY            38988 0.0   nan nan 3.90e+10 0.0 0.0e+00 0.0e+00
> 0.0e+00  0 15  0  0  0   0 15  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  100
> VecAYPX            18297 0.0   nan nan 1.83e+10 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  7  0  0  0   0  7  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  100
> VecAssemblyBegin    1197 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+02  0  0  0  0  1   0  0  0  0  2  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> VecAssemblyEnd      1197 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> VecScatterBegin    19494 0.0   nan nan 0.00e+00 0.0 3.9e+04 8.0e+04
> 0.0e+00  0  0 100 100  0   0  0 100 100  0  -nan    -nan      0 1.80e-05
>  0 0.00e+00  0
> VecScatterEnd      19494 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> SFSetGraph             1 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> SFSetUp                1 0.0   nan nan 0.00e+00 0.0 4.0e+00 2.0e+04
> 5.0e-01  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> SFPack             19494 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 1.80e-05    0
> 0.00e+00  0
> SFUnpack           19494 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> KSPSetUp               1 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> KSPSolve            1197 0.0 2.0291e+02 0.0 2.55e+11 0.0 3.9e+04 8.0e+04
> 3.1e+04 12 100 100 100 49  12 100 100 100 98  2503    -nan      0 1.80e-05
>    0 0.00e+00  100
> PCSetUp                1 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.5e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
> PCApply            20691 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  5  0  0  0  0   5  0  0  0  0  -nan    -nan      0 0.00e+00    0
> 0.00e+00  0
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Object Type          Creations   Destructions. Reports information only
> for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     7              3
>               Vector     7              1
>            Index Set     2              2
>    Star Forest Graph     1              0
>        Krylov Solver     1              0
>       Preconditioner     1              0
>
> ========================================================================================================================
> Average time to get PetscTime(): 3.01e-08
> Average time for MPI_Barrier(): 3.8054e-06
> Average time for zero size MPI_Send(): 7.101e-06
> #PETSc Option Table entries:
> -log_view # (source: command line)
> -mat_type mpiaijkokkos # (source: command line)
> -vec_type kokkos # (source: command line)
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3
> FCOPTFLAGS=-O3 HIPOPTFLAGS=-O3 --with-debugging=0 --with-cc=cc
> --with-cxx=CC --with-fc=ftn --with-hip --with-hipc=hipcc
> --LIBS="-L/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib -lmpi
> -L/opt/cray/pe/mpich/8.1.23/gtl/lib -lmpi_gtl_hsa" --download-kokkos
> --download-kokkos-kernels --download-suitesparse --download-hypre
> --download-cmake
> -----------------------------------------
> Libraries compiled on 2024-03-05 17:04:36 on login08
> Machine characteristics:
> Linux-5.14.21-150400.24.46_12.0.83-cray_shasta_c-x86_64-with-glibc2.3.4
> Using PETSc directory: /autofs/nccs-svm1_home1/vanellam/Software/petsc
> Using PETSc arch: arch-linux-frontier-opt-gcc
> -----------------------------------------
>
> Using C compiler: cc  -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> -fvisibility=hidden -O3
> Using Fortran compiler: ftn  -fPIC -Wall -ffree-line-length-none
> -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O3
> -----------------------------------------
>
> Using include paths:
> -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/include
> -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc/include
> -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc/include/suitesparse
> -I/opt/rocm-5.4.0/include
> -----------------------------------------
>
> Using C linker: cc
> Using Fortran linker: ftn
> Using libraries:
> -Wl,-rpath,/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc/lib
> -L/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc/lib
> -lpetsc
> -Wl,-rpath,/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc/lib
> -L/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc/lib
> -Wl,-rpath,/opt/rocm-5.4.0/lib -L/opt/rocm-5.4.0/lib
> -Wl,-rpath,/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib
> -L/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib
> -Wl,-rpath,/opt/cray/pe/mpich/8.1.23/gtl/lib
> -L/opt/cray/pe/mpich/8.1.23/gtl/lib -Wl,-rpath,/opt/cray/pe/libsci/
> 22.12.1.1/GNU/9.1/x86_64/lib -L/opt/cray/pe/libsci/
> 22.12.1.1/GNU/9.1/x86_64/lib
> -Wl,-rpath,/sw/frontier/spack-envs/base/opt/cray-sles15-zen3/gcc-12.2.0/darshan-runtime-3.4.0-ftq5gccg3qjtyh5xeo2bz4wqkjayjhw3/lib
> -L/sw/frontier/spack-envs/base/opt/cray-sles15-zen3/gcc-12.2.0/darshan-runtime-3.4.0-ftq5gccg3qjtyh5xeo2bz4wqkjayjhw3/lib
> -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib
> -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.1.8/lib
> -L/opt/cray/pe/pmi/6.1.8/lib
> -Wl,-rpath,/opt/cray/xpmem/2.6.2-2.5_2.22__gd067c3f.shasta/lib64
> -L/opt/cray/xpmem/2.6.2-2.5_2.22__gd067c3f.shasta/lib64
> -Wl,-rpath,/opt/cray/pe/gcc/12.2.0/snos/lib/gcc/x86_64-suse-linux/12.2.0
> -L/opt/cray/pe/gcc/12.2.0/snos/lib/gcc/x86_64-suse-linux/12.2.0
> -Wl,-rpath,/opt/cray/pe/gcc/12.2.0/snos/lib64
> -L/opt/cray/pe/gcc/12.2.0/snos/lib64 -Wl,-rpath,/opt/rocm-5.4.0/llvm/lib
> -L/opt/rocm-5.4.0/llvm/lib -Wl,-rpath,/opt/cray/pe/gcc/12.2.0/snos/lib
> -L/opt/cray/pe/gcc/12.2.0/snos/lib -lHYPRE -lspqr -lumfpack -lklu -lcholmod
> -lamd -lkokkoskernels -lkokkoscontainers -lkokkoscore -lkokkossimd
> -lhipsparse -lhipblas -lhipsolver -lrocsparse -lrocsolver -lrocblas
> -lrocrand -lamdhip64 -lmpi -lmpi_gtl_hsa -ldarshan -lz -ldl -lxpmem
> -lgfortran -lm -lmpifort_gnu_91 -lmpi_gnu_91 -lsci_gnu_82_mpi -lsci_gnu_82
> -ldsmml -lpmi -lpmi2 -lgfortran -lquadmath -lpthread -lm -lgcc_s -lstdc++
> -lquadmath -lmpi -lmpi_gtl_hsa
> -----------------------------------------
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240305/d9d634cf/attachment-0001.html>


More information about the petsc-users mailing list