[petsc-users] GAMG advice

Wed Nov 1 16:45:47 CDT 2017

Thanks Barry.
By simply replacing chebychev by richardson I get similar performance
with GAMG and ML (GAMG even slightly faster):

-pc_type
gamg                                                                                                                                                       

-pc_gamg_type
agg                                                                                                                                                   

-pc_gamg_threshold
0.03                                                                                                                                             

-pc_gamg_square_graph 10
-pc_gamg_sym_graph
-mg_levels_ksp_type
richardson                                                                                                                                      

-mg_levels_pc_type sor

Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix
is asymmetric? For serial runs it doesn't seem to matter, but in
parallel the PC setup hangs (after calls of
PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.

David

On 10/21/2017 12:10 AM, Barry Smith wrote:
>   David,
>
>    GAMG picks the number of levels based on how the coarsening process etc proceeds. You cannot hardwire it to a particular value. You can run with -info to get more info potentially on the decisions GAMG is making.
>
>   Barry
>
>> On Oct 20, 2017, at 2:06 PM, David Nolte <dnolte at dim.uchile.cl> wrote:
>>
>> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
>> was not taken into account:  
>> type: gamg
>>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
>>
>>
>>
>> On 10/20/2017 03:32 PM, David Nolte wrote:
>>> Dear all,
>>>
>>> I have some problems using GAMG as a preconditioner for (F)GMRES.
>>> Background: I am solving the incompressible, unsteady Navier-Stokes
>>> equations with a coupled mixed FEM approach, using P1/P1 elements for
>>> velocity and pressure on an unstructured tetrahedron mesh with about
>>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
>>> hence, no zeros on the diagonal of the pressure block. Time
>>> discretization with semi-implicit backward Euler. The flow is a
>>> convection dominated flow through a nozzle.
>>>
>>> So far, for this setup, I have been quite happy with a simple FGMRES/ML
>>> solver for the full system (rather bruteforce, I admit, but much faster
>>> than any block/Schur preconditioners I tried):
>>>
>>>     -ksp_converged_reason
>>>     -ksp_monitor_true_residual
>>>     -ksp_type fgmres
>>>     -ksp_rtol 1.0e-6
>>>     -ksp_initial_guess_nonzero
>>>
>>>     -pc_type ml
>>>     -pc_ml_Threshold 0.03
>>>     -pc_ml_maxNlevels 3
>>>
>>> This setup converges in ~100 iterations (see below the ksp_view output)
>>> to rtol:
>>>
>>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
>>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
>>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
>>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
>>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
>>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
>>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
>>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
>>>
>>>
>>> Now I'd like to try GAMG instead of ML. However, I don't know how to set
>>> it up to get similar performance.
>>> The obvious/naive
>>>
>>>     -pc_type gamg
>>>     -pc_gamg_type agg
>>>
>>> # with and without
>>>     -pc_gamg_threshold 0.03
>>>     -pc_mg_levels 3
>>>
>>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
>>> proc), for instance:
>>> np = 1:
>>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
>>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
>>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
>>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
>>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
>>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>>>
>>> np = 8:
>>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
>>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
>>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>>>
>>> A very high threshold seems to improve the GAMG PC, for instance with
>>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
>>> What else should I try?
>>>
>>> I would very much appreciate any advice on configuring GAMG and
>>> differences w.r.t ML to be taken into account (not a multigrid expert
>>> though).
>>>
>>> Thanks, best wishes
>>> David
>>>
>>>
>>> ------
>>> ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75 -pc_mg_levels 3
>>>
>>> KSP Object: 1 MPI processes
>>>   type: fgmres
>>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>> Orthogonalization with no iterative refinement
>>>     GMRES: happy breakdown tolerance 1e-30
>>>   maximum iterations=10000
>>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
>>>   right preconditioning
>>>   using nonzero initial guess
>>>   using UNPRECONDITIONED norm type for convergence test
>>> PC Object: 1 MPI processes
>>>   type: gamg
>>>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
>>>       Cycles per PCApply=1
>>>       Using Galerkin computed coarse grid matrices
>>>       GAMG specific options
>>>         Threshold for dropping small values from graph 0.75
>>>         AGG specific options
>>>           Symmetric graph false
>>>   Coarse grid solver -- level -------------------------------
>>>     KSP Object:    (mg_levels_0_)     1 MPI processes
>>>       type: preonly
>>>       maximum iterations=2, initial guess is zero
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>       left preconditioning
>>>       using NONE norm type for convergence test
>>>     PC Object:    (mg_levels_0_)     1 MPI processes
>>>       type: sor
>>>         SOR: type = local_symmetric, iterations = 1, local iterations =
>>> 1, omega = 1.
>>>       linear system matrix = precond matrix:
>>>       Mat Object:       1 MPI processes
>>>         type: seqaij
>>>         rows=1745224, cols=1745224
>>>         total: nonzeros=99452608, allocated nonzeros=99452608
>>>         total number of mallocs used during MatSetValues calls =0
>>>           using I-node routines: found 1037847 nodes, limit used is 5
>>>   linear system matrix = precond matrix:
>>>   Mat Object:   1 MPI processes
>>>     type: seqaij
>>>     rows=1745224, cols=1745224
>>>     total: nonzeros=99452608, allocated nonzeros=99452608
>>>     total number of mallocs used during MatSetValues calls =0
>>>       using I-node routines: found 1037847 nodes, limit used is 5
>>>
>>>
>>> ------
>>> ksp_view for -pc_type ml:
>>>
>>> KSP Object: 8 MPI processes
>>>   type: fgmres
>>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>> Orthogonalization with no iterative refinement
>>>     GMRES: happy breakdown tolerance 1e-30
>>>   maximum iterations=10000
>>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
>>>   right preconditioning
>>>   using nonzero initial guess
>>>   using UNPRECONDITIONED norm type for convergence test
>>> PC Object: 8 MPI processes
>>>   type: ml
>>>     MG: type is MULTIPLICATIVE, levels=3 cycles=v
>>>       Cycles per PCApply=1
>>>       Using Galerkin computed coarse grid matrices
>>>   Coarse grid solver -- level -------------------------------
>>>     KSP Object:    (mg_coarse_)     8 MPI processes
>>>       type: preonly
>>>       maximum iterations=10000, initial guess is zero
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>       left preconditioning
>>>       using NONE norm type for convergence test
>>>     PC Object:    (mg_coarse_)     8 MPI processes
>>>       type: redundant
>>>         Redundant preconditioner: First (color=0) of 8 PCs follows
>>>         KSP Object:        (mg_coarse_redundant_)         1 MPI processes
>>>           type: preonly
>>>           maximum iterations=10000, initial guess is zero
>>>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>           left preconditioning
>>>           using NONE norm type for convergence test
>>>         PC Object:        (mg_coarse_redundant_)         1 MPI processes
>>>           type: lu
>>>             LU: out-of-place factorization
>>>             tolerance for zero pivot 2.22045e-14
>>>             using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>>>             matrix ordering: nd
>>>             factor fill ratio given 5., needed 10.4795
>>>               Factored matrix follows:
>>>                 Mat Object:                 1 MPI processes
>>>                   type: seqaij
>>>                   rows=6822, cols=6822
>>>                   package used to perform factorization: petsc
>>>                   total: nonzeros=9575688, allocated nonzeros=9575688
>>>                   total number of mallocs used during MatSetValues calls =0
>>>                     not using I-node routines
>>>           linear system matrix = precond matrix:
>>>           Mat Object:           1 MPI processes
>>>             type: seqaij
>>>             rows=6822, cols=6822
>>>             total: nonzeros=913758, allocated nonzeros=913758
>>>             total number of mallocs used during MatSetValues calls =0
>>>               not using I-node routines
>>>       linear system matrix = precond matrix:
>>>       Mat Object:       8 MPI processes
>>>         type: mpiaij
>>>         rows=6822, cols=6822
>>>         total: nonzeros=913758, allocated nonzeros=913758
>>>         total number of mallocs used during MatSetValues calls =0
>>>           not using I-node (on process 0) routines
>>>   Down solver (pre-smoother) on level 1 -------------------------------
>>>     KSP Object:    (mg_levels_1_)     8 MPI processes
>>>       type: richardson
>>>         Richardson: damping factor=1.
>>>       maximum iterations=2
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>       left preconditioning
>>>       using nonzero initial guess
>>>       using NONE norm type for convergence test
>>>     PC Object:    (mg_levels_1_)     8 MPI processes
>>>       type: sor
>>>         SOR: type = local_symmetric, iterations = 1, local iterations =
>>> 1, omega = 1.
>>>       linear system matrix = precond matrix:
>>>       Mat Object:       8 MPI processes
>>>         type: mpiaij
>>>         rows=67087, cols=67087
>>>         total: nonzeros=9722749, allocated nonzeros=9722749
>>>         total number of mallocs used during MatSetValues calls =0
>>>           not using I-node (on process 0) routines
>>>   Up solver (post-smoother) same as down solver (pre-smoother)
>>>   Down solver (pre-smoother) on level 2 -------------------------------
>>>     KSP Object:    (mg_levels_2_)     8 MPI processes
>>>       type: richardson
>>>         Richardson: damping factor=1.
>>>       maximum iterations=2
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>       left preconditioning
>>>       using nonzero initial guess
>>>       using NONE norm type for convergence test
>>>     PC Object:    (mg_levels_2_)     8 MPI processes
>>>       type: sor
>>>         SOR: type = local_symmetric, iterations = 1, local iterations =
>>> 1, omega = 1.
>>>       linear system matrix = precond matrix:
>>>       Mat Object:       8 MPI processes
>>>         type: mpiaij
>>>         rows=1745224, cols=1745224
>>>         total: nonzeros=99452608, allocated nonzeros=99452608
>>>         total number of mallocs used during MatSetValues calls =0
>>>           using I-node (on process 0) routines: found 126690 nodes,
>>> limit used is 5
>>>   Up solver (post-smoother) same as down solver (pre-smoother)
>>>   linear system matrix = precond matrix:
>>>   Mat Object:   8 MPI processes
>>>     type: mpiaij
>>>     rows=1745224, cols=1745224
>>>     total: nonzeros=99452608, allocated nonzeros=99452608
>>>     total number of mallocs used during MatSetValues calls =0
>>>       using I-node (on process 0) routines: found 126690 nodes, limit
>>> used is 5
>>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171101/0c417792/attachment.sig>