[petsc-users] [MPI GPU Aware] KSP_DIVERGED
LEDAC Pierre
Pierre.LEDAC at cea.fr
Tue Sep 17 12:43:32 CDT 2024
Yes. Only OpenMPI 5.0.5 with Petsc 3.20.
Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°43
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
________________________________
De : Junchao Zhang <junchao.zhang at gmail.com>
Envoyé : mardi 17 septembre 2024 18:09:44
À : LEDAC Pierre
Cc : petsc-users; ROUMET Elie
Objet : Re: [petsc-users] [MPI GPU Aware] KSP_DIVERGED
Did you "fix" the problem with OpenMPI 5, but keep petsc unchanged (ie., still 3.20)?
--Junchao Zhang
On Tue, Sep 17, 2024 at 9:47 AM LEDAC Pierre <Pierre.LEDAC at cea.fr<mailto:Pierre.LEDAC at cea.fr>> wrote:
Thanks Satish, and nice guess for OpenMPI 5 !
It seems it solves the issue (at least on my GPU box where I reproduced the issue with 8 MPI ranks with OpenMPI 4.x).
Unhappily, all the clusters we currently use have no module with OpenMPI 5.x. Seems I need to build it to really confirm.
Probably we will prevent users from configuring our code with OpenMPI-cuda 4.x cause it is really a weird bug.
Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°43
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
________________________________
De : Satish Balay <balay.anl at fastmail.org<mailto:balay.anl at fastmail.org>>
Envoyé : mardi 17 septembre 2024 15:39:22
À : LEDAC Pierre
Cc : Junchao Zhang; petsc-users; ROUMET Elie
Objet : Re: [petsc-users] [MPI GPU Aware] KSP_DIVERGED
On Tue, 17 Sep 2024, LEDAC Pierre wrote:
> Thanks all, I will try and report.
>
>
> Last question, if I use "-use_gpu_aware_mpi 0" flag with a MPI GPU Aware library, do PETSc
>
> disable GPU intra/inter communications and send MPI buffers as usual (with extra Device<->Host copies) ?
Yes.
Not: Wrt using MPI that is not GPU-aware - we are changing the default behavior - to not require "-use_gpu_aware_mpi 0" flag.
https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7813__;!!G_uCfscf7eWS!a7fRmVH3eqR_0f_HiFDjLrRAYPYq1kBnMtoRTo97kVEXQj-HKv_0UKQF-sOah1FOXhIZhEXytfhmQBviU5wLFvXOR5DJ$
Satish
>
>
> Thanks,
>
>
> Pierre LEDAC
> Commissariat à l’énergie atomique et aux énergies alternatives
> Centre de SACLAY
> DES/ISAS/DM2S/SGLS/LCAN
> Bâtiment 451 – point courrier n°43
> F-91191 Gif-sur-Yvette
> +33 1 69 08 04 03
> +33 6 83 42 05 79
> ________________________________
> De : Satish Balay <balay.anl at fastmail.org<mailto:balay.anl at fastmail.org>>
> Envoyé : lundi 16 septembre 2024 18:57:02
> À : Junchao Zhang
> Cc : LEDAC Pierre; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>; ROUMET Elie
> Objet : Re: [petsc-users] [MPI GPU Aware] KSP_DIVERGED
>
> And/Or - try latest OpenMPI [or MPICH] and see if that makes a difference.
>
> --download-mpich or --download-openmpi with latest petsc should build gpu-aware-mpi
>
> Satish
>
> On Mon, 16 Sep 2024, Junchao Zhang wrote:
>
> > Could you try petsc/main to see if the problem persists?
> >
> > --Junchao Zhang
> >
> >
> > On Mon, Sep 16, 2024 at 10:51 AM LEDAC Pierre <Pierre.LEDAC at cea.fr<mailto:Pierre.LEDAC at cea.fr>> wrote:
> >
> > > Hi all,
> > >
> > >
> > > We are using PETSc 3.20 in our code and running succesfully several
> > > solvers on Nvidia GPU with OpenMPI library which are not GPU aware (so I
> > > need to add the flag -use_gpu_aware_mpi 0).
> > >
> > >
> > > But now, when using OpenMPI GPU Aware library (OpenMPI 4.0.5 ou 4.1.5 from
> > > NVHPC), some parallel calculations failed with *KSP_DIVERGED_ITS* or
> > > *KSP_DIVERGED_DTOL*
> > >
> > > with several configurations. It may run wells on a small test case with
> > > (matrix is symmetric):
> > >
> > >
> > > *-ksp_type cg -pc_type gamg -pc_gamg_type classical*
> > >
> > >
> > > But suddenly with a number of devices for instance bigger than 4 or 8, it
> > > may fail.
> > >
> > >
> > > If I switch to another solver (BiCGstab), it may converge:
> > >
> > >
> > > *-ksp_type bcgs -pc_type gamg -pc_gamg_type classical*
> > >
> > >
> > > The more sensitive cases where it diverges are the following:
> > >
> > >
> > > *-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg *
> > >
> > > *-ksp_type cg -pc_type gamg -pc_gamg_type classical*
> > >
> > >
> > > And the *bcgs* turnaroud doesn't work each time...
> > >
> > >
> > > It seems to work without problem with aggregation (at least 128 GPUs on my
> > > simulation):
> > >
> > > *-ksp_type cg -pc_type gamg -pc_gamg_type agg*
> > >
> > >
> > > So I guess there is a weird thing happening in my code during the solve in
> > > PETSc with MPI GPU Aware, as all the previous configurations works with non
> > > GPU aware MPI.
> > >
> > >
> > > Here is the -ksp_view log during one fail with the first configuration:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > *KSP Object: () 8 MPI processes type: cg maximum iterations=10000,
> > > nonzero initial guess tolerances: relative=0., absolute=0.0001,
> > > divergence=10000. left preconditioning using UNPRECONDITIONED norm type
> > > for convergence test PC Object: () 8 MPI processes type: hypre HYPRE
> > > BoomerAMG preconditioning Cycle type V Maximum number of levels
> > > 25 Maximum number of iterations PER hypre call 1 Convergence
> > > tolerance PER hypre call 0. Threshold for strong coupling 0.7
> > > Interpolation truncation factor 0. Interpolation: max elements per
> > > row 0 Number of levels of aggressive coarsening 0 Number of
> > > paths for aggressive coarsening 1 Maximum row sums 0.9 Sweeps
> > > down 1 Sweeps up 1 Sweeps on coarse 1
> > > Relax down l1scaled-Jacobi Relax up
> > > l1scaled-Jacobi Relax on coarse Gaussian-elimination Relax
> > > weight (all) 1. Outer relax weight (all) 1. Maximum size
> > > of coarsest grid 9 Minimum size of coarsest grid 1 Not using
> > > CF-relaxation Not using more complex smoothers. Measure
> > > type local Coarsen type PMIS Interpolation type
> > > ext+i SpGEMM type cusparse linear system matrix = precond
> > > matrix: Mat Object: () 8 MPI processes type: mpiaijcusparse
> > > rows=64000, cols=64000 total: nonzeros=311040, allocated
> > > nonzeros=311040 total number of mallocs used during MatSetValues
> > > calls=0 not using I-node (on process 0) routines*
> > >
> > >
> > > I didn't succeed for the moment creating a reproducer with ex.c examples...
> > >
> > >
> > > Did you see this kind of behaviour before?
> > >
> > > Should I update my PETSc version ?
> > >
> > >
> > > Thanks for any advice,
> > >
> > >
> > > Pierre LEDAC
> > > Commissariat à l’énergie atomique et aux énergies alternatives
> > > Centre de SACLAY
> > > DES/ISAS/DM2S/SGLS/LCAN
> > > Bâtiment 451 – point courrier n°43
> > > F-91191 Gif-sur-Yvette
> > > +33 1 69 08 04 03
> > > +33 6 83 42 05 79
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240917/993390a1/attachment.html>
More information about the petsc-users
mailing list