[petsc-users] GAMG advice

Wed Nov 8 19:07:41 CST 2017

On Fri, Oct 20, 2017 at 11:10 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   David,
>
>    GAMG picks the number of levels based on how the coarsening process etc
> proceeds. You cannot hardwire it to a particular value.

Yes you can. GAMG will respect -pc_mg_levels N, but we don't recommend
using it.

> You can run with -info to get more info potentially on the decisions GAMG
> is making.
>

this is noisy but grep on GAMG and you will see the levels and sizes, etc.

>
>   Barry
>
> > On Oct 20, 2017, at 2:06 PM, David Nolte <dnolte at dim.uchile.cl> wrote:
> >
> > PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
> > was not taken into account:
> > type: gamg
> >     MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >
> >
> >
> > On 10/20/2017 03:32 PM, David Nolte wrote:
> >> Dear all,
> >>
> >> I have some problems using GAMG as a preconditioner for (F)GMRES.
> >> Background: I am solving the incompressible, unsteady Navier-Stokes
> >> equations with a coupled mixed FEM approach, using P1/P1 elements for
> >> velocity and pressure on an unstructured tetrahedron mesh with about
> >> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
> >> hence, no zeros on the diagonal of the pressure block. Time
> >> discretization with semi-implicit backward Euler. The flow is a
> >> convection dominated flow through a nozzle.
> >>
> >> So far, for this setup, I have been quite happy with a simple FGMRES/ML
> >> solver for the full system (rather bruteforce, I admit, but much faster
> >> than any block/Schur preconditioners I tried):
> >>
> >>     -ksp_converged_reason
> >>     -ksp_monitor_true_residual
> >>     -ksp_type fgmres
> >>     -ksp_rtol 1.0e-6
> >>     -ksp_initial_guess_nonzero
> >>
> >>     -pc_type ml
> >>     -pc_ml_Threshold 0.03
> >>     -pc_ml_maxNlevels 3
> >>
> >> This setup converges in ~100 iterations (see below the ksp_view output)
> >> to rtol:
> >>
> >> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
> >> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
> >> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
> >> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
> >> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
> >> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
> >> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
> >> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
> >>
> >>
> >> Now I'd like to try GAMG instead of ML. However, I don't know how to set
> >> it up to get similar performance.
> >> The obvious/naive
> >>
> >>     -pc_type gamg
> >>     -pc_gamg_type agg
> >>
> >> # with and without
> >>     -pc_gamg_threshold 0.03
> >>     -pc_mg_levels 3
> >>
> >> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> >> proc), for instance:
> >> np = 1:
> >> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> >> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> >> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> >> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> >> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> >> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
> >>
> >> np = 8:
> >> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> >> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> >> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>
> >> A very high threshold seems to improve the GAMG PC, for instance with
> >> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> >> What else should I try?
> >>
> >> I would very much appreciate any advice on configuring GAMG and
> >> differences w.r.t ML to be taken into account (not a multigrid expert
> >> though).
> >>
> >> Thanks, best wishes
> >> David
> >>
> >>
> >> ------
> >> ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75 -pc_mg_levels 3
> >>
> >> KSP Object: 1 MPI processes
> >>   type: fgmres
> >>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> >> Orthogonalization with no iterative refinement
> >>     GMRES: happy breakdown tolerance 1e-30
> >>   maximum iterations=10000
> >>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
> >>   right preconditioning
> >>   using nonzero initial guess
> >>   using UNPRECONDITIONED norm type for convergence test
> >> PC Object: 1 MPI processes
> >>   type: gamg
> >>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >>       Cycles per PCApply=1
> >>       Using Galerkin computed coarse grid matrices
> >>       GAMG specific options
> >>         Threshold for dropping small values from graph 0.75
> >>         AGG specific options
> >>           Symmetric graph false
> >>   Coarse grid solver -- level -------------------------------
> >>     KSP Object:    (mg_levels_0_)     1 MPI processes
> >>       type: preonly
> >>       maximum iterations=2, initial guess is zero
> >>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>       left preconditioning
> >>       using NONE norm type for convergence test
> >>     PC Object:    (mg_levels_0_)     1 MPI processes
> >>       type: sor
> >>         SOR: type = local_symmetric, iterations = 1, local iterations =
> >> 1, omega = 1.
> >>       linear system matrix = precond matrix:
> >>       Mat Object:       1 MPI processes
> >>         type: seqaij
> >>         rows=1745224, cols=1745224
> >>         total: nonzeros=99452608, allocated nonzeros=99452608
> >>         total number of mallocs used during MatSetValues calls =0
> >>           using I-node routines: found 1037847 nodes, limit used is 5
> >>   linear system matrix = precond matrix:
> >>   Mat Object:   1 MPI processes
> >>     type: seqaij
> >>     rows=1745224, cols=1745224
> >>     total: nonzeros=99452608, allocated nonzeros=99452608
> >>     total number of mallocs used during MatSetValues calls =0
> >>       using I-node routines: found 1037847 nodes, limit used is 5
> >>
> >>
> >> ------
> >> ksp_view for -pc_type ml:
> >>
> >> KSP Object: 8 MPI processes
> >>   type: fgmres
> >>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> >> Orthogonalization with no iterative refinement
> >>     GMRES: happy breakdown tolerance 1e-30
> >>   maximum iterations=10000
> >>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
> >>   right preconditioning
> >>   using nonzero initial guess
> >>   using UNPRECONDITIONED norm type for convergence test
> >> PC Object: 8 MPI processes
> >>   type: ml
> >>     MG: type is MULTIPLICATIVE, levels=3 cycles=v
> >>       Cycles per PCApply=1
> >>       Using Galerkin computed coarse grid matrices
> >>   Coarse grid solver -- level -------------------------------
> >>     KSP Object:    (mg_coarse_)     8 MPI processes
> >>       type: preonly
> >>       maximum iterations=10000, initial guess is zero
> >>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>       left preconditioning
> >>       using NONE norm type for convergence test
> >>     PC Object:    (mg_coarse_)     8 MPI processes
> >>       type: redundant
> >>         Redundant preconditioner: First (color=0) of 8 PCs follows
> >>         KSP Object:        (mg_coarse_redundant_)         1 MPI
> processes
> >>           type: preonly
> >>           maximum iterations=10000, initial guess is zero
> >>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>           left preconditioning
> >>           using NONE norm type for convergence test
> >>         PC Object:        (mg_coarse_redundant_)         1 MPI processes
> >>           type: lu
> >>             LU: out-of-place factorization
> >>             tolerance for zero pivot 2.22045e-14
> >>             using diagonal shift on blocks to prevent zero pivot
> [INBLOCKS]
> >>             matrix ordering: nd
> >>             factor fill ratio given 5., needed 10.4795
> >>               Factored matrix follows:
> >>                 Mat Object:                 1 MPI processes
> >>                   type: seqaij
> >>                   rows=6822, cols=6822
> >>                   package used to perform factorization: petsc
> >>                   total: nonzeros=9575688, allocated nonzeros=9575688
> >>                   total number of mallocs used during MatSetValues
> calls =0
> >>                     not using I-node routines
> >>           linear system matrix = precond matrix:
> >>           Mat Object:           1 MPI processes
> >>             type: seqaij
> >>             rows=6822, cols=6822
> >>             total: nonzeros=913758, allocated nonzeros=913758
> >>             total number of mallocs used during MatSetValues calls =0
> >>               not using I-node routines
> >>       linear system matrix = precond matrix:
> >>       Mat Object:       8 MPI processes
> >>         type: mpiaij
> >>         rows=6822, cols=6822
> >>         total: nonzeros=913758, allocated nonzeros=913758
> >>         total number of mallocs used during MatSetValues calls =0
> >>           not using I-node (on process 0) routines
> >>   Down solver (pre-smoother) on level 1 -------------------------------
> >>     KSP Object:    (mg_levels_1_)     8 MPI processes
> >>       type: richardson
> >>         Richardson: damping factor=1.
> >>       maximum iterations=2
> >>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>       left preconditioning
> >>       using nonzero initial guess
> >>       using NONE norm type for convergence test
> >>     PC Object:    (mg_levels_1_)     8 MPI processes
> >>       type: sor
> >>         SOR: type = local_symmetric, iterations = 1, local iterations =
> >> 1, omega = 1.
> >>       linear system matrix = precond matrix:
> >>       Mat Object:       8 MPI processes
> >>         type: mpiaij
> >>         rows=67087, cols=67087
> >>         total: nonzeros=9722749, allocated nonzeros=9722749
> >>         total number of mallocs used during MatSetValues calls =0
> >>           not using I-node (on process 0) routines
> >>   Up solver (post-smoother) same as down solver (pre-smoother)
> >>   Down solver (pre-smoother) on level 2 -------------------------------
> >>     KSP Object:    (mg_levels_2_)     8 MPI processes
> >>       type: richardson
> >>         Richardson: damping factor=1.
> >>       maximum iterations=2
> >>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>       left preconditioning
> >>       using nonzero initial guess
> >>       using NONE norm type for convergence test
> >>     PC Object:    (mg_levels_2_)     8 MPI processes
> >>       type: sor
> >>         SOR: type = local_symmetric, iterations = 1, local iterations =
> >> 1, omega = 1.
> >>       linear system matrix = precond matrix:
> >>       Mat Object:       8 MPI processes
> >>         type: mpiaij
> >>         rows=1745224, cols=1745224
> >>         total: nonzeros=99452608, allocated nonzeros=99452608
> >>         total number of mallocs used during MatSetValues calls =0
> >>           using I-node (on process 0) routines: found 126690 nodes,
> >> limit used is 5
> >>   Up solver (post-smoother) same as down solver (pre-smoother)
> >>   linear system matrix = precond matrix:
> >>   Mat Object:   8 MPI processes
> >>     type: mpiaij
> >>     rows=1745224, cols=1745224
> >>     total: nonzeros=99452608, allocated nonzeros=99452608
> >>     total number of mallocs used during MatSetValues calls =0
> >>       using I-node (on process 0) routines: found 126690 nodes, limit
> >> used is 5
> >>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171108/dc3c12f2/attachment.html>