[petsc-users] GAMG advice

Mark Adams mfadams at lbl.gov
Wed Nov 8 19:11:25 CST 2017


On Wed, Nov 1, 2017 at 5:45 PM, David Nolte <dnolte at dim.uchile.cl> wrote:

> Thanks Barry.
> By simply replacing chebychev by richardson I get similar performance
> with GAMG and ML


That too (I assumed you were using the same, I could not see cheby in your
view data).

I guess SOR works for the coarse grid solver because the coarse grid is
small. It should help using lu.


> (GAMG even slightly faster):
>

This is "random" fluctuations.


>
> -pc_type
> gamg
>
>
>
> -pc_gamg_type
> agg
>
>
>
> -pc_gamg_threshold
> 0.03
>
>
>
> -pc_gamg_square_graph 10
> -pc_gamg_sym_graph
> -mg_levels_ksp_type
> richardson
>
>
>
> -mg_levels_pc_type sor
>
> Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix
> is asymmetric?


yes,


> For serial runs it doesn't seem to matter,


yes,


> but in
> parallel the PC setup hangs (after calls of
> PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.
>

yep,


>
> David
>
>
> On 10/21/2017 12:10 AM, Barry Smith wrote:
> >   David,
> >
> >    GAMG picks the number of levels based on how the coarsening process
> etc proceeds. You cannot hardwire it to a particular value. You can run
> with -info to get more info potentially on the decisions GAMG is making.
> >
> >   Barry
> >
> >> On Oct 20, 2017, at 2:06 PM, David Nolte <dnolte at dim.uchile.cl> wrote:
> >>
> >> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
> >> was not taken into account:
> >> type: gamg
> >>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >>
> >>
> >>
> >> On 10/20/2017 03:32 PM, David Nolte wrote:
> >>> Dear all,
> >>>
> >>> I have some problems using GAMG as a preconditioner for (F)GMRES.
> >>> Background: I am solving the incompressible, unsteady Navier-Stokes
> >>> equations with a coupled mixed FEM approach, using P1/P1 elements for
> >>> velocity and pressure on an unstructured tetrahedron mesh with about
> >>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
> >>> hence, no zeros on the diagonal of the pressure block. Time
> >>> discretization with semi-implicit backward Euler. The flow is a
> >>> convection dominated flow through a nozzle.
> >>>
> >>> So far, for this setup, I have been quite happy with a simple FGMRES/ML
> >>> solver for the full system (rather bruteforce, I admit, but much faster
> >>> than any block/Schur preconditioners I tried):
> >>>
> >>>     -ksp_converged_reason
> >>>     -ksp_monitor_true_residual
> >>>     -ksp_type fgmres
> >>>     -ksp_rtol 1.0e-6
> >>>     -ksp_initial_guess_nonzero
> >>>
> >>>     -pc_type ml
> >>>     -pc_ml_Threshold 0.03
> >>>     -pc_ml_maxNlevels 3
> >>>
> >>> This setup converges in ~100 iterations (see below the ksp_view output)
> >>> to rtol:
> >>>
> >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
> >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
> >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
> >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
> >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
> >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
> >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
> >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
> >>>
> >>>
> >>> Now I'd like to try GAMG instead of ML. However, I don't know how to
> set
> >>> it up to get similar performance.
> >>> The obvious/naive
> >>>
> >>>     -pc_type gamg
> >>>     -pc_gamg_type agg
> >>>
> >>> # with and without
> >>>     -pc_gamg_threshold 0.03
> >>>     -pc_mg_levels 3
> >>>
> >>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> >>> proc), for instance:
> >>> np = 1:
> >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
> >>>
> >>> np = 8:
> >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>>
> >>> A very high threshold seems to improve the GAMG PC, for instance with
> >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> >>> What else should I try?
> >>>
> >>> I would very much appreciate any advice on configuring GAMG and
> >>> differences w.r.t ML to be taken into account (not a multigrid expert
> >>> though).
> >>>
> >>> Thanks, best wishes
> >>> David
> >>>
> >>>
> >>> ------
> >>> ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75 -pc_mg_levels 3
> >>>
> >>> KSP Object: 1 MPI processes
> >>>   type: fgmres
> >>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> >>> Orthogonalization with no iterative refinement
> >>>     GMRES: happy breakdown tolerance 1e-30
> >>>   maximum iterations=10000
> >>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
> >>>   right preconditioning
> >>>   using nonzero initial guess
> >>>   using UNPRECONDITIONED norm type for convergence test
> >>> PC Object: 1 MPI processes
> >>>   type: gamg
> >>>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >>>       Cycles per PCApply=1
> >>>       Using Galerkin computed coarse grid matrices
> >>>       GAMG specific options
> >>>         Threshold for dropping small values from graph 0.75
> >>>         AGG specific options
> >>>           Symmetric graph false
> >>>   Coarse grid solver -- level -------------------------------
> >>>     KSP Object:    (mg_levels_0_)     1 MPI processes
> >>>       type: preonly
> >>>       maximum iterations=2, initial guess is zero
> >>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>>       left preconditioning
> >>>       using NONE norm type for convergence test
> >>>     PC Object:    (mg_levels_0_)     1 MPI processes
> >>>       type: sor
> >>>         SOR: type = local_symmetric, iterations = 1, local iterations =
> >>> 1, omega = 1.
> >>>       linear system matrix = precond matrix:
> >>>       Mat Object:       1 MPI processes
> >>>         type: seqaij
> >>>         rows=1745224, cols=1745224
> >>>         total: nonzeros=99452608, allocated nonzeros=99452608
> >>>         total number of mallocs used during MatSetValues calls =0
> >>>           using I-node routines: found 1037847 nodes, limit used is 5
> >>>   linear system matrix = precond matrix:
> >>>   Mat Object:   1 MPI processes
> >>>     type: seqaij
> >>>     rows=1745224, cols=1745224
> >>>     total: nonzeros=99452608, allocated nonzeros=99452608
> >>>     total number of mallocs used during MatSetValues calls =0
> >>>       using I-node routines: found 1037847 nodes, limit used is 5
> >>>
> >>>
> >>> ------
> >>> ksp_view for -pc_type ml:
> >>>
> >>> KSP Object: 8 MPI processes
> >>>   type: fgmres
> >>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> >>> Orthogonalization with no iterative refinement
> >>>     GMRES: happy breakdown tolerance 1e-30
> >>>   maximum iterations=10000
> >>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
> >>>   right preconditioning
> >>>   using nonzero initial guess
> >>>   using UNPRECONDITIONED norm type for convergence test
> >>> PC Object: 8 MPI processes
> >>>   type: ml
> >>>     MG: type is MULTIPLICATIVE, levels=3 cycles=v
> >>>       Cycles per PCApply=1
> >>>       Using Galerkin computed coarse grid matrices
> >>>   Coarse grid solver -- level -------------------------------
> >>>     KSP Object:    (mg_coarse_)     8 MPI processes
> >>>       type: preonly
> >>>       maximum iterations=10000, initial guess is zero
> >>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>>       left preconditioning
> >>>       using NONE norm type for convergence test
> >>>     PC Object:    (mg_coarse_)     8 MPI processes
> >>>       type: redundant
> >>>         Redundant preconditioner: First (color=0) of 8 PCs follows
> >>>         KSP Object:        (mg_coarse_redundant_)         1 MPI
> processes
> >>>           type: preonly
> >>>           maximum iterations=10000, initial guess is zero
> >>>           tolerances:  relative=1e-05, absolute=1e-50,
> divergence=10000.
> >>>           left preconditioning
> >>>           using NONE norm type for convergence test
> >>>         PC Object:        (mg_coarse_redundant_)         1 MPI
> processes
> >>>           type: lu
> >>>             LU: out-of-place factorization
> >>>             tolerance for zero pivot 2.22045e-14
> >>>             using diagonal shift on blocks to prevent zero pivot
> [INBLOCKS]
> >>>             matrix ordering: nd
> >>>             factor fill ratio given 5., needed 10.4795
> >>>               Factored matrix follows:
> >>>                 Mat Object:                 1 MPI processes
> >>>                   type: seqaij
> >>>                   rows=6822, cols=6822
> >>>                   package used to perform factorization: petsc
> >>>                   total: nonzeros=9575688, allocated nonzeros=9575688
> >>>                   total number of mallocs used during MatSetValues
> calls =0
> >>>                     not using I-node routines
> >>>           linear system matrix = precond matrix:
> >>>           Mat Object:           1 MPI processes
> >>>             type: seqaij
> >>>             rows=6822, cols=6822
> >>>             total: nonzeros=913758, allocated nonzeros=913758
> >>>             total number of mallocs used during MatSetValues calls =0
> >>>               not using I-node routines
> >>>       linear system matrix = precond matrix:
> >>>       Mat Object:       8 MPI processes
> >>>         type: mpiaij
> >>>         rows=6822, cols=6822
> >>>         total: nonzeros=913758, allocated nonzeros=913758
> >>>         total number of mallocs used during MatSetValues calls =0
> >>>           not using I-node (on process 0) routines
> >>>   Down solver (pre-smoother) on level 1 ------------------------------
> -
> >>>     KSP Object:    (mg_levels_1_)     8 MPI processes
> >>>       type: richardson
> >>>         Richardson: damping factor=1.
> >>>       maximum iterations=2
> >>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>>       left preconditioning
> >>>       using nonzero initial guess
> >>>       using NONE norm type for convergence test
> >>>     PC Object:    (mg_levels_1_)     8 MPI processes
> >>>       type: sor
> >>>         SOR: type = local_symmetric, iterations = 1, local iterations =
> >>> 1, omega = 1.
> >>>       linear system matrix = precond matrix:
> >>>       Mat Object:       8 MPI processes
> >>>         type: mpiaij
> >>>         rows=67087, cols=67087
> >>>         total: nonzeros=9722749, allocated nonzeros=9722749
> >>>         total number of mallocs used during MatSetValues calls =0
> >>>           not using I-node (on process 0) routines
> >>>   Up solver (post-smoother) same as down solver (pre-smoother)
> >>>   Down solver (pre-smoother) on level 2 ------------------------------
> -
> >>>     KSP Object:    (mg_levels_2_)     8 MPI processes
> >>>       type: richardson
> >>>         Richardson: damping factor=1.
> >>>       maximum iterations=2
> >>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>>       left preconditioning
> >>>       using nonzero initial guess
> >>>       using NONE norm type for convergence test
> >>>     PC Object:    (mg_levels_2_)     8 MPI processes
> >>>       type: sor
> >>>         SOR: type = local_symmetric, iterations = 1, local iterations =
> >>> 1, omega = 1.
> >>>       linear system matrix = precond matrix:
> >>>       Mat Object:       8 MPI processes
> >>>         type: mpiaij
> >>>         rows=1745224, cols=1745224
> >>>         total: nonzeros=99452608, allocated nonzeros=99452608
> >>>         total number of mallocs used during MatSetValues calls =0
> >>>           using I-node (on process 0) routines: found 126690 nodes,
> >>> limit used is 5
> >>>   Up solver (post-smoother) same as down solver (pre-smoother)
> >>>   linear system matrix = precond matrix:
> >>>   Mat Object:   8 MPI processes
> >>>     type: mpiaij
> >>>     rows=1745224, cols=1745224
> >>>     total: nonzeros=99452608, allocated nonzeros=99452608
> >>>     total number of mallocs used during MatSetValues calls =0
> >>>       using I-node (on process 0) routines: found 126690 nodes, limit
> >>> used is 5
> >>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171108/8539a6e4/attachment-0001.html>


More information about the petsc-users mailing list