[petsc-users] [MPI GPU Aware] KSP_DIVERGED

Mon Sep 16 10:51:23 CDT 2024

Hi all,

We are using PETSc 3.20 in our code and running succesfully several solvers on Nvidia GPU with OpenMPI library which are not GPU aware (so I need to add the flag -use_gpu_aware_mpi 0).

But now, when using OpenMPI GPU Aware library (OpenMPI 4.0.5 ou 4.1.5 from NVHPC), some parallel calculations failed with KSP_DIVERGED_ITS or KSP_DIVERGED_DTOL

with several configurations. It may run wells on a small test case with (matrix is symmetric):

-ksp_type cg -pc_type gamg -pc_gamg_type classical

But suddenly with a number of devices for instance bigger than 4 or 8, it may fail.

If I switch to another solver (BiCGstab), it may converge:

-ksp_type bcgs -pc_type gamg -pc_gamg_type classical

The more sensitive cases where it diverges are the following:

-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg

-ksp_type cg -pc_type gamg  -pc_gamg_type classical

And the bcgs turnaroud doesn't work each time...

It seems to work without problem with aggregation (at least 128 GPUs on my simulation):

-ksp_type cg -pc_type gamg -pc_gamg_type agg

So I guess there is a weird thing happening in my code during the solve in PETSc with MPI GPU Aware, as all the previous configurations works with non GPU aware MPI.

Here is the -ksp_view log during one fail with the first configuration:

KSP Object: () 8 MPI processes
  type: cg
  maximum iterations=10000, nonzero initial guess
  tolerances:  relative=0., absolute=0.0001, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: () 8 MPI processes
  type: hypre
    HYPRE BoomerAMG preconditioning
      Cycle type V
      Maximum number of levels 25
      Maximum number of iterations PER hypre call 1
      Convergence tolerance PER hypre call 0.
      Threshold for strong coupling 0.7
      Interpolation truncation factor 0.
      Interpolation: max elements per row 0
      Number of levels of aggressive coarsening 0
      Number of paths for aggressive coarsening 1
      Maximum row sums 0.9
      Sweeps down         1
      Sweeps up           1
      Sweeps on coarse    1
      Relax down          l1scaled-Jacobi
      Relax up            l1scaled-Jacobi
      Relax on coarse     Gaussian-elimination
      Relax weight  (all)      1.
      Outer relax weight (all) 1.
      Maximum size of coarsest grid 9
      Minimum size of coarsest grid 1
      Not using CF-relaxation
      Not using more complex smoothers.
      Measure type        local
      Coarsen type        PMIS
      Interpolation type  ext+i
      SpGEMM type         cusparse
  linear system matrix = precond matrix:
  Mat Object: () 8 MPI processes
    type: mpiaijcusparse
    rows=64000, cols=64000
    total: nonzeros=311040, allocated nonzeros=311040
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines

I didn't succeed for the moment creating a reproducer with ex.c examples...

Did you see this kind of behaviour before?

Should I update my PETSc version ?

Thanks for any advice,

Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°43
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240916/2b9a8ae2/attachment.html>