<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


<meta name="Generator" content="Microsoft Exchange Server">


<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>


</head>


<body>


<meta content="text/html; charset=UTF-8">


<style type="text/css" style="">


<!--


p


        {margin-top:0;


        margin-bottom:0}


-->


</style>


<div dir="ltr">


<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">


<p>Thanks all, I will try and report.</p>


<p><br>


</p>


<p>Last question, if I use "-use_gpu_aware_mpi 0" flag with a MPI GPU Aware library, do PETSc</p>


<p>disable GPU intra/inter communications and send MPI buffers as usual (with extra Device<->Host copies) ?<br>


</p>


<p><br>


</p>


<p>Thanks,<br>


</p>


<p><br>


</p>


<div id="x_Signature">


<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:rgb(0,0,0); font-family:Calibri,Helvetica,sans-serif,"EmojiFont","Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols">


<div style="font-family:Tahoma; font-size:13px">


<div class="x_BodyFragment"><font size="2"><span style="font-size:10pt">


<div class="x_PlainText">Pierre LEDAC<br>


Commissariat à l’énergie atomique et aux énergies alternatives<br>


Centre de SACLAY<br>


DES/ISAS/DM2S/SGLS/LCAN<br>


Bâtiment 451 – point courrier n°43<br>


F-91191 Gif-sur-Yvette<br>


+33 1 69 08 04 03<br>


+33 6 83 42 05 79</div>


</span></font></div>


</div>


</div>


</div>


</div>


<hr tabindex="-1" style="display:inline-block; width:98%">


<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>De :</b> Satish Balay <balay.anl@fastmail.org><br>


<b>Envoyé :</b> lundi 16 septembre 2024 18:57:02<br>


<b>À :</b> Junchao Zhang<br>


<b>Cc :</b> LEDAC Pierre; petsc-users@mcs.anl.gov; ROUMET Elie<br>


<b>Objet :</b> Re: [petsc-users] [MPI GPU Aware] KSP_DIVERGED</font>


<div> </div>


</div>


</div>


<font size="2"><span style="font-size:10pt;">


<div class="PlainText">And/Or - try latest OpenMPI [or MPICH] and see if that makes a difference.<br>


<br>


--download-mpich or --download-openmpi with latest petsc should build gpu-aware-mpi<br>


<br>


Satish<br>


<br>


On Mon, 16 Sep 2024, Junchao Zhang wrote:<br>


<br>


> Could you try petsc/main to see if the problem persists?<br>


> <br>


> --Junchao Zhang<br>


> <br>


> <br>


> On Mon, Sep 16, 2024 at 10:51 AM LEDAC Pierre <Pierre.LEDAC@cea.fr> wrote:<br>


> <br>


> > Hi all,<br>


> ><br>


> ><br>


> > We are using PETSc 3.20 in our code and running succesfully several<br>


> > solvers on Nvidia GPU with OpenMPI library which are not GPU aware (so I<br>


> > need to add the flag -use_gpu_aware_mpi 0).<br>


> ><br>


> ><br>


> > But now, when using OpenMPI GPU Aware library (OpenMPI 4.0.5 ou 4.1.5 from<br>


> > NVHPC), some parallel calculations failed with *KSP_DIVERGED_ITS* or<br>


> > *KSP_DIVERGED_DTOL*<br>


> ><br>


> > with several configurations. It may run wells on a small test case with<br>


> > (matrix is symmetric):<br>


> ><br>


> ><br>


> > *-ksp_type cg -pc_type gamg -pc_gamg_type classical*<br>


> ><br>


> ><br>


> > But suddenly with a number of devices for instance bigger than 4 or 8, it<br>


> > may fail.<br>


> ><br>


> ><br>


> > If I switch to another solver (BiCGstab), it may converge:<br>


> ><br>


> ><br>


> > *-ksp_type bcgs -pc_type gamg -pc_gamg_type classical*<br>


> ><br>


> ><br>


> > The more sensitive cases where it diverges are the following:<br>


> ><br>


> ><br>


> > *-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg *<br>


> ><br>


> > *-ksp_type cg -pc_type gamg  -pc_gamg_type classical*<br>


> ><br>


> ><br>


> > And the *bcgs* turnaroud doesn't work each time...<br>


> ><br>


> ><br>


> > It seems to work without problem with aggregation (at least 128 GPUs on my<br>


> > simulation):<br>


> ><br>


> > *-ksp_type cg -pc_type gamg -pc_gamg_type agg*<br>


> ><br>


> ><br>


> > So I guess there is a weird thing happening in my code during the solve in<br>


> > PETSc with MPI GPU Aware, as all the previous configurations works with non<br>


> > GPU aware MPI.<br>


> ><br>


> ><br>


> > Here is the -ksp_view log during one fail with the first configuration:<br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> ><br>


> > *KSP Object: () 8 MPI processes   type: cg   maximum iterations=10000,<br>


> > nonzero initial guess   tolerances:  relative=0., absolute=0.0001,<br>


> > divergence=10000.   left preconditioning   using UNPRECONDITIONED norm type<br>


> > for convergence test PC Object: () 8 MPI processes   type: hypre     HYPRE<br>


> > BoomerAMG preconditioning       Cycle type V       Maximum number of levels<br>


> > 25       Maximum number of iterations PER hypre call 1       Convergence<br>


> > tolerance PER hypre call 0.       Threshold for strong coupling 0.7<br>


> > Interpolation truncation factor 0.       Interpolation: max elements per<br>


> > row 0       Number of levels of aggressive coarsening 0       Number of<br>


> > paths for aggressive coarsening 1       Maximum row sums 0.9       Sweeps<br>


> > down         1       Sweeps up           1       Sweeps on coarse    1<br>


> >       Relax down          l1scaled-Jacobi       Relax up<br>


> > l1scaled-Jacobi       Relax on coarse     Gaussian-elimination       Relax<br>


> > weight  (all)      1.       Outer relax weight (all) 1.       Maximum size<br>


> > of coarsest grid 9       Minimum size of coarsest grid 1       Not using<br>


> > CF-relaxation       Not using more complex smoothers.       Measure<br>


> > type        local       Coarsen type        PMIS       Interpolation type<br>


> > ext+i       SpGEMM type         cusparse   linear system matrix = precond<br>


> > matrix:   Mat Object: () 8 MPI processes     type: mpiaijcusparse<br>


> > rows=64000, cols=64000     total: nonzeros=311040, allocated<br>


> > nonzeros=311040     total number of mallocs used during MatSetValues<br>


> > calls=0       not using I-node (on process 0) routines*<br>


> ><br>


> ><br>


> > I didn't succeed for the moment creating a reproducer with ex.c examples...<br>


> ><br>


> ><br>


> > Did you see this kind of behaviour before?<br>


> ><br>


> > Should I update my PETSc version ?<br>


> ><br>


> ><br>


> > Thanks for any advice,<br>


> ><br>


> ><br>


> > Pierre LEDAC<br>


> > Commissariat à l’énergie atomique et aux énergies alternatives<br>


> > Centre de SACLAY<br>


> > DES/ISAS/DM2S/SGLS/LCAN<br>


> > Bâtiment 451 – point courrier n°43<br>


> > F-91191 Gif-sur-Yvette<br>


> > +33 1 69 08 04 03<br>


> > +33 6 83 42 05 79<br>


> ><br>


> <br>


</div>


</span></font>


</body>


</html>