[petsc-users] [MPI GPU Aware] KSP_DIVERGED
LEDAC Pierre
Pierre.LEDAC at cea.fr
Mon Sep 16 10:51:23 CDT 2024
Hi all,
We are using PETSc 3.20 in our code and running succesfully several solvers on Nvidia GPU with OpenMPI library which are not GPU aware (so I need to add the flag -use_gpu_aware_mpi 0).
But now, when using OpenMPI GPU Aware library (OpenMPI 4.0.5 ou 4.1.5 from NVHPC), some parallel calculations failed with KSP_DIVERGED_ITS or KSP_DIVERGED_DTOL
with several configurations. It may run wells on a small test case with (matrix is symmetric):
-ksp_type cg -pc_type gamg -pc_gamg_type classical
But suddenly with a number of devices for instance bigger than 4 or 8, it may fail.
If I switch to another solver (BiCGstab), it may converge:
-ksp_type bcgs -pc_type gamg -pc_gamg_type classical
The more sensitive cases where it diverges are the following:
-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg
-ksp_type cg -pc_type gamg -pc_gamg_type classical
And the bcgs turnaroud doesn't work each time...
It seems to work without problem with aggregation (at least 128 GPUs on my simulation):
-ksp_type cg -pc_type gamg -pc_gamg_type agg
So I guess there is a weird thing happening in my code during the solve in PETSc with MPI GPU Aware, as all the previous configurations works with non GPU aware MPI.
Here is the -ksp_view log during one fail with the first configuration:
KSP Object: () 8 MPI processes
type: cg
maximum iterations=10000, nonzero initial guess
tolerances: relative=0., absolute=0.0001, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: () 8 MPI processes
type: hypre
HYPRE BoomerAMG preconditioning
Cycle type V
Maximum number of levels 25
Maximum number of iterations PER hypre call 1
Convergence tolerance PER hypre call 0.
Threshold for strong coupling 0.7
Interpolation truncation factor 0.
Interpolation: max elements per row 0
Number of levels of aggressive coarsening 0
Number of paths for aggressive coarsening 1
Maximum row sums 0.9
Sweeps down 1
Sweeps up 1
Sweeps on coarse 1
Relax down l1scaled-Jacobi
Relax up l1scaled-Jacobi
Relax on coarse Gaussian-elimination
Relax weight (all) 1.
Outer relax weight (all) 1.
Maximum size of coarsest grid 9
Minimum size of coarsest grid 1
Not using CF-relaxation
Not using more complex smoothers.
Measure type local
Coarsen type PMIS
Interpolation type ext+i
SpGEMM type cusparse
linear system matrix = precond matrix:
Mat Object: () 8 MPI processes
type: mpiaijcusparse
rows=64000, cols=64000
total: nonzeros=311040, allocated nonzeros=311040
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines
I didn't succeed for the moment creating a reproducer with ex.c examples...
Did you see this kind of behaviour before?
Should I update my PETSc version ?
Thanks for any advice,
Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°43
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240916/2b9a8ae2/attachment.html>
More information about the petsc-users
mailing list