[petsc-users] GAMG advice

Fri Nov 10 07:58:04 CST 2017

On Thu, Nov 9, 2017 at 2:19 PM, David Nolte <dnolte at dim.uchile.cl> wrote:

> Hi Mark,
>
> thanks for clarifying.
> When I wrote the initial question I had somehow overlooked the fact that
> the GAMG standard smoother was Chebychev while ML uses SOR. All the other
> comments concerning threshold etc were based on this mistake.
>
> The following settings work quite well, of course LU is used on the coarse
> level.
>
>     -pc_type gamg
>     -pc_gamg_type agg
>     -pc_gamg_threshold 0.03
>     -pc_gamg_square_graph 10        # no effect ?
>     -pc_gamg_sym_graph
>     -mg_levels_ksp_type richardson
>     -mg_levels_pc_type sor
>
> -pc_gamg_agg_nsmooths 0 does not seem to improve the convergence.
>

Looks reasonable. And this smoothing is good for elliptic operators
convergence but it makes the operator more expensive. It's worth doing for
elliptic operators but in my experience not for others. If you convergence
rate does not change then you probably want -pc_gamg_agg_nsmooths 0. This
is a cheaper (if smoothing does not help convergence a lot), simpler method
and want to use it.

>
> The ksp view now looks like this: (does this seem reasonable?)
>
>
> KSP Object: 4 MPI processes
>   type: fgmres
>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>     GMRES: happy breakdown tolerance 1e-30
>   maximum iterations=10000
>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
>   right preconditioning
>   using nonzero initial guess
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 4 MPI processes
>   type: gamg
>     MG: type is MULTIPLICATIVE, levels=5 cycles=v
>       Cycles per PCApply=1
>       Using Galerkin computed coarse grid matrices
>       GAMG specific options
>         Threshold for dropping small values from graph 0.03
>         AGG specific options
>           Symmetric graph true
>   Coarse grid solver -- level -------------------------------
>     KSP Object:    (mg_coarse_)     4 MPI processes
>       type: preonly
>       maximum iterations=10000, initial guess is zero
>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>       left preconditioning
>       using NONE norm type for convergence test
>     PC Object:    (mg_coarse_)     4 MPI processes
>       type: bjacobi
>         block Jacobi: number of blocks = 4
>         Local solve is same for all blocks, in the following KSP and PC
> objects:
>       KSP Object:      (mg_coarse_sub_)       1 MPI processes
>         type: preonly
>         maximum iterations=1, initial guess is zero
>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>         left preconditioning
>         using NONE norm type for convergence test
>       PC Object:      (mg_coarse_sub_)       1 MPI processes
>         type: lu
>           LU: out-of-place factorization
>           tolerance for zero pivot 2.22045e-14
>           using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>           matrix ordering: nd
>           factor fill ratio given 5., needed 1.
>             Factored matrix follows:
>               Mat Object:               1 MPI processes
>                 type: seqaij
>                 rows=38, cols=38
>                 package used to perform factorization: petsc
>                 total: nonzeros=1444, allocated nonzeros=1444
>                 total number of mallocs used during MatSetValues calls =0
>                   using I-node routines: found 8 nodes, limit used is 5
>         linear system matrix = precond matrix:
>         Mat Object:         1 MPI processes
>           type: seqaij
>           rows=38, cols=38
>           total: nonzeros=1444, allocated nonzeros=1444
>           total number of mallocs used during MatSetValues calls =0
>             using I-node routines: found 8 nodes, limit used is 5
>       linear system matrix = precond matrix:
>       Mat Object:       4 MPI processes
>         type: mpiaij
>         rows=38, cols=38
>         total: nonzeros=1444, allocated nonzeros=1444
>         total number of mallocs used during MatSetValues calls =0
>           using I-node (on process 0) routines: found 8 nodes, limit used
> is 5
>   Down solver (pre-smoother) on level 1 -------------------------------
>     KSP Object:    (mg_levels_1_)     4 MPI processes
>       type: richardson
>         Richardson: damping factor=1.
>       maximum iterations=2
>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>       left preconditioning
>       using nonzero initial guess
>       using NONE norm type for convergence test
>     PC Object:    (mg_levels_1_)     4 MPI processes
>       type: sor
>         SOR: type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
>       linear system matrix = precond matrix:
>       Mat Object:       4 MPI processes
>         type: mpiaij
>         rows=168, cols=168
>         total: nonzeros=19874, allocated nonzeros=19874
>         total number of mallocs used during MatSetValues calls =0
>           using I-node (on process 0) routines: found 17 nodes, limit used
> is 5
>   Up solver (post-smoother) same as down solver (pre-smoother)
>   Down solver (pre-smoother) on level 2 -------------------------------
>     KSP Object:    (mg_levels_2_)     4 MPI processes
>       type: richardson
>         Richardson: damping factor=1.
>       maximum iterations=2
>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>       left preconditioning
>       using nonzero initial guess
>       using NONE norm type for convergence test
>     PC Object:    (mg_levels_2_)     4 MPI processes
>       type: sor
>         SOR: type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
>       linear system matrix = precond matrix:
>       Mat Object:       4 MPI processes
>         type: mpiaij
>         rows=3572, cols=3572
>         total: nonzeros=963872, allocated nonzeros=963872
>         total number of mallocs used during MatSetValues calls =0
>           not using I-node (on process 0) routines
>   Up solver (post-smoother) same as down solver (pre-smoother)
>   Down solver (pre-smoother) on level 3 -------------------------------
>     KSP Object:    (mg_levels_3_)     4 MPI processes
>       type: richardson
>         Richardson: damping factor=1.
>       maximum iterations=2
>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>       left preconditioning
>       using nonzero initial guess
>       using NONE norm type for convergence test
>     PC Object:    (mg_levels_3_)     4 MPI processes
>       type: sor
>         SOR: type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
>       linear system matrix = precond matrix:
>       Mat Object:       4 MPI processes
>         type: mpiaij
>         rows=21179, cols=21179
>         total: nonzeros=1060605, allocated nonzeros=1060605
>         total number of mallocs used during MatSetValues calls =0
>           not using I-node (on process 0) routines
>   Up solver (post-smoother) same as down solver (pre-smoother)
>   Down solver (pre-smoother) on level 4 -------------------------------
>     KSP Object:    (mg_levels_4_)     4 MPI processes
>       type: richardson
>         Richardson: damping factor=1.
>       maximum iterations=2
>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>       left preconditioning
>       using nonzero initial guess
>       using NONE norm type for convergence test
>     PC Object:    (mg_levels_4_)     4 MPI processes
>       type: sor
>         SOR: type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
>       linear system matrix = precond matrix:
>       Mat Object:       4 MPI processes
>         type: mpiaij
>         rows=1745224, cols=1745224
>         total: nonzeros=99452608, allocated nonzeros=99452608
>         total number of mallocs used during MatSetValues calls =0
>           using I-node (on process 0) routines: found 254433 nodes, limit
> used is 5
>   Up solver (post-smoother) same as down solver (pre-smoother)
>   linear system matrix = precond matrix:
>   Mat Object:   4 MPI processes
>     type: mpiaij
>     rows=1745224, cols=1745224
>     total: nonzeros=99452608, allocated nonzeros=99452608
>     total number of mallocs used during MatSetValues calls =0
>       using I-node (on process 0) routines: found 254433 nodes, limit used
> is 5
>
>
>
>
> Thanks, David
>
>
>
>
> On 11/08/2017 10:11 PM, Mark Adams wrote:
>
>
>
> On Wed, Nov 1, 2017 at 5:45 PM, David Nolte <dnolte at dim.uchile.cl> wrote:
>
>> Thanks Barry.
>> By simply replacing chebychev by richardson I get similar performance
>> with GAMG and ML
>
>
> That too (I assumed you were using the same, I could not see cheby in your
> view data).
>
> I guess SOR works for the coarse grid solver because the coarse grid is
> small. It should help using lu.
>
>
>> (GAMG even slightly faster):
>>
>
> This is "random" fluctuations.
>
>
>>
>> -pc_type
>> gamg
>>
>>
>>
>> -pc_gamg_type
>> agg
>>
>>
>>
>> -pc_gamg_threshold
>> 0.03
>>
>>
>>
>> -pc_gamg_square_graph 10
>> -pc_gamg_sym_graph
>> -mg_levels_ksp_type
>> richardson
>>
>>
>>
>> -mg_levels_pc_type sor
>>
>> Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix
>> is asymmetric?
>
>
> yes,
>
>
>> For serial runs it doesn't seem to matter,
>
>
> yes,
>
>
>> but in
>> parallel the PC setup hangs (after calls of
>> PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.
>>
>
> yep,
>
>
>>
>> David
>>
>>
>> On 10/21/2017 12:10 AM, Barry Smith wrote:
>> >   David,
>> >
>> >    GAMG picks the number of levels based on how the coarsening process
>> etc proceeds. You cannot hardwire it to a particular value. You can run
>> with -info to get more info potentially on the decisions GAMG is making.
>> >
>> >   Barry
>> >
>> >> On Oct 20, 2017, at 2:06 PM, David Nolte <dnolte at dim.uchile.cl> wrote:
>> >>
>> >> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3
>> option
>> >> was not taken into account:
>> >> type: gamg
>> >>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
>> >>
>> >>
>> >>
>> >> On 10/20/2017 03:32 PM, David Nolte wrote:
>> >>> Dear all,
>> >>>
>> >>> I have some problems using GAMG as a preconditioner for (F)GMRES.
>> >>> Background: I am solving the incompressible, unsteady Navier-Stokes
>> >>> equations with a coupled mixed FEM approach, using P1/P1 elements for
>> >>> velocity and pressure on an unstructured tetrahedron mesh with about
>> >>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
>> >>> hence, no zeros on the diagonal of the pressure block. Time
>> >>> discretization with semi-implicit backward Euler. The flow is a
>> >>> convection dominated flow through a nozzle.
>> >>>
>> >>> So far, for this setup, I have been quite happy with a simple
>> FGMRES/ML
>> >>> solver for the full system (rather bruteforce, I admit, but much
>> faster
>> >>> than any block/Schur preconditioners I tried):
>> >>>
>> >>>     -ksp_converged_reason
>> >>>     -ksp_monitor_true_residual
>> >>>     -ksp_type fgmres
>> >>>     -ksp_rtol 1.0e-6
>> >>>     -ksp_initial_guess_nonzero
>> >>>
>> >>>     -pc_type ml
>> >>>     -pc_ml_Threshold 0.03
>> >>>     -pc_ml_maxNlevels 3
>> >>>
>> >>> This setup converges in ~100 iterations (see below the ksp_view
>> output)
>> >>> to rtol:
>> >>>
>> >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
>> >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
>> >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
>> >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
>> >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
>> >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
>> >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
>> >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
>> >>>
>> >>>
>> >>> Now I'd like to try GAMG instead of ML. However, I don't know how to
>> set
>> >>> it up to get similar performance.
>> >>> The obvious/naive
>> >>>
>> >>>     -pc_type gamg
>> >>>     -pc_gamg_type agg
>> >>>
>> >>> # with and without
>> >>>     -pc_gamg_threshold 0.03
>> >>>     -pc_mg_levels 3
>> >>>
>> >>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
>> >>> proc), for instance:
>> >>> np = 1:
>> >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
>> >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
>> >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
>> >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
>> >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
>> >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>> >>>
>> >>> np = 8:
>> >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
>> >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
>> >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>> >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>> >>>
>> >>> A very high threshold seems to improve the GAMG PC, for instance with
>> >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
>> >>> What else should I try?
>> >>>
>> >>> I would very much appreciate any advice on configuring GAMG and
>> >>> differences w.r.t ML to be taken into account (not a multigrid expert
>> >>> though).
>> >>>
>> >>> Thanks, best wishes
>> >>> David
>> >>>
>> >>>
>> >>> ------
>> >>> ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75 -pc_mg_levels
>> 3
>> >>>
>> >>> KSP Object: 1 MPI processes
>> >>>   type: fgmres
>> >>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> >>> Orthogonalization with no iterative refinement
>> >>>     GMRES: happy breakdown tolerance 1e-30
>> >>>   maximum iterations=10000
>> >>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
>> >>>   right preconditioning
>> >>>   using nonzero initial guess
>> >>>   using UNPRECONDITIONED norm type for convergence test
>> >>> PC Object: 1 MPI processes
>> >>>   type: gamg
>> >>>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
>> >>>       Cycles per PCApply=1
>> >>>       Using Galerkin computed coarse grid matrices
>> >>>       GAMG specific options
>> >>>         Threshold for dropping small values from graph 0.75
>> >>>         AGG specific options
>> >>>           Symmetric graph false
>> >>>   Coarse grid solver -- level -------------------------------
>> >>>     KSP Object:    (mg_levels_0_)     1 MPI processes
>> >>>       type: preonly
>> >>>       maximum iterations=2, initial guess is zero
>> >>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >>>       left preconditioning
>> >>>       using NONE norm type for convergence test
>> >>>     PC Object:    (mg_levels_0_)     1 MPI processes
>> >>>       type: sor
>> >>>         SOR: type = local_symmetric, iterations = 1, local iterations
>> =
>> >>> 1, omega = 1.
>> >>>       linear system matrix = precond matrix:
>> >>>       Mat Object:       1 MPI processes
>> >>>         type: seqaij
>> >>>         rows=1745224, cols=1745224
>> >>>         total: nonzeros=99452608, allocated nonzeros=99452608
>> >>>         total number of mallocs used during MatSetValues calls =0
>> >>>           using I-node routines: found 1037847 nodes, limit used is 5
>> >>>   linear system matrix = precond matrix:
>> >>>   Mat Object:   1 MPI processes
>> >>>     type: seqaij
>> >>>     rows=1745224, cols=1745224
>> >>>     total: nonzeros=99452608, allocated nonzeros=99452608
>> >>>     total number of mallocs used during MatSetValues calls =0
>> >>>       using I-node routines: found 1037847 nodes, limit used is 5
>> >>>
>> >>>
>> >>> ------
>> >>> ksp_view for -pc_type ml:
>> >>>
>> >>> KSP Object: 8 MPI processes
>> >>>   type: fgmres
>> >>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> >>> Orthogonalization with no iterative refinement
>> >>>     GMRES: happy breakdown tolerance 1e-30
>> >>>   maximum iterations=10000
>> >>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
>> >>>   right preconditioning
>> >>>   using nonzero initial guess
>> >>>   using UNPRECONDITIONED norm type for convergence test
>> >>> PC Object: 8 MPI processes
>> >>>   type: ml
>> >>>     MG: type is MULTIPLICATIVE, levels=3 cycles=v
>> >>>       Cycles per PCApply=1
>> >>>       Using Galerkin computed coarse grid matrices
>> >>>   Coarse grid solver -- level -------------------------------
>> >>>     KSP Object:    (mg_coarse_)     8 MPI processes
>> >>>       type: preonly
>> >>>       maximum iterations=10000, initial guess is zero
>> >>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >>>       left preconditioning
>> >>>       using NONE norm type for convergence test
>> >>>     PC Object:    (mg_coarse_)     8 MPI processes
>> >>>       type: redundant
>> >>>         Redundant preconditioner: First (color=0) of 8 PCs follows
>> >>>         KSP Object:        (mg_coarse_redundant_)         1 MPI
>> processes
>> >>>           type: preonly
>> >>>           maximum iterations=10000, initial guess is zero
>> >>>           tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >>>           left preconditioning
>> >>>           using NONE norm type for convergence test
>> >>>         PC Object:        (mg_coarse_redundant_)         1 MPI
>> processes
>> >>>           type: lu
>> >>>             LU: out-of-place factorization
>> >>>             tolerance for zero pivot 2.22045e-14
>> >>>             using diagonal shift on blocks to prevent zero pivot
>> [INBLOCKS]
>> >>>             matrix ordering: nd
>> >>>             factor fill ratio given 5., needed 10.4795
>> >>>               Factored matrix follows:
>> >>>                 Mat Object:                 1 MPI processes
>> >>>                   type: seqaij
>> >>>                   rows=6822, cols=6822
>> >>>                   package used to perform factorization: petsc
>> >>>                   total: nonzeros=9575688, allocated nonzeros=9575688
>> >>>                   total number of mallocs used during MatSetValues
>> calls =0
>> >>>                     not using I-node routines
>> >>>           linear system matrix = precond matrix:
>> >>>           Mat Object:           1 MPI processes
>> >>>             type: seqaij
>> >>>             rows=6822, cols=6822
>> >>>             total: nonzeros=913758, allocated nonzeros=913758
>> >>>             total number of mallocs used during MatSetValues calls =0
>> >>>               not using I-node routines
>> >>>       linear system matrix = precond matrix:
>> >>>       Mat Object:       8 MPI processes
>> >>>         type: mpiaij
>> >>>         rows=6822, cols=6822
>> >>>         total: nonzeros=913758, allocated nonzeros=913758
>> >>>         total number of mallocs used during MatSetValues calls =0
>> >>>           not using I-node (on process 0) routines
>> >>>   Down solver (pre-smoother) on level 1 ------------------------------
>> -
>> >>>     KSP Object:    (mg_levels_1_)     8 MPI processes
>> >>>       type: richardson
>> >>>         Richardson: damping factor=1.
>> >>>       maximum iterations=2
>> >>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >>>       left preconditioning
>> >>>       using nonzero initial guess
>> >>>       using NONE norm type for convergence test
>> >>>     PC Object:    (mg_levels_1_)     8 MPI processes
>> >>>       type: sor
>> >>>         SOR: type = local_symmetric, iterations = 1, local iterations
>> =
>> >>> 1, omega = 1.
>> >>>       linear system matrix = precond matrix:
>> >>>       Mat Object:       8 MPI processes
>> >>>         type: mpiaij
>> >>>         rows=67087, cols=67087
>> >>>         total: nonzeros=9722749, allocated nonzeros=9722749
>> >>>         total number of mallocs used during MatSetValues calls =0
>> >>>           not using I-node (on process 0) routines
>> >>>   Up solver (post-smoother) same as down solver (pre-smoother)
>> >>>   Down solver (pre-smoother) on level 2 ------------------------------
>> -
>> >>>     KSP Object:    (mg_levels_2_)     8 MPI processes
>> >>>       type: richardson
>> >>>         Richardson: damping factor=1.
>> >>>       maximum iterations=2
>> >>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >>>       left preconditioning
>> >>>       using nonzero initial guess
>> >>>       using NONE norm type for convergence test
>> >>>     PC Object:    (mg_levels_2_)     8 MPI processes
>> >>>       type: sor
>> >>>         SOR: type = local_symmetric, iterations = 1, local iterations
>> =
>> >>> 1, omega = 1.
>> >>>       linear system matrix = precond matrix:
>> >>>       Mat Object:       8 MPI processes
>> >>>         type: mpiaij
>> >>>         rows=1745224, cols=1745224
>> >>>         total: nonzeros=99452608, allocated nonzeros=99452608
>> >>>         total number of mallocs used during MatSetValues calls =0
>> >>>           using I-node (on process 0) routines: found 126690 nodes,
>> >>> limit used is 5
>> >>>   Up solver (post-smoother) same as down solver (pre-smoother)
>> >>>   linear system matrix = precond matrix:
>> >>>   Mat Object:   8 MPI processes
>> >>>     type: mpiaij
>> >>>     rows=1745224, cols=1745224
>> >>>     total: nonzeros=99452608, allocated nonzeros=99452608
>> >>>     total number of mallocs used during MatSetValues calls =0
>> >>>       using I-node (on process 0) routines: found 126690 nodes, limit
>> >>> used is 5
>> >>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171110/90289fcc/attachment-0001.html>