<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 1, 2017 at 5:45 PM, David Nolte <span dir="ltr"><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks Barry.<br>

By simply replacing chebychev by richardson I get similar performance<br>

with GAMG and ML </blockquote><div><br></div><div>That too (I assumed you were using the same, I could not see cheby in your view data).</div><div><br></div><div>I guess SOR works for the coarse grid solver because the coarse grid is small. It should help using lu.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">(GAMG even slightly faster):<br></blockquote><div><br></div><div>This is "random" fluctuations.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

-pc_type<br>

gamg                          <wbr>                              <wbr>                              <wbr>                              <wbr>                              <wbr>     <br>

<br>

-pc_gamg_type<br>

agg                           <wbr>                              <wbr>                              <wbr>                              <wbr>                              <br>

<br>

-pc_gamg_threshold<br>

0.03                          <wbr>                              <wbr>                              <wbr>                              <wbr>                         <br>

<br>

-pc_gamg_square_graph 10<br>

-pc_gamg_sym_graph<br>

-mg_levels_ksp_type<br>

richardson                    <wbr>                              <wbr>                              <wbr>                              <wbr>                        <br>

<br>

-mg_levels_pc_type sor<br>

<br>

Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix<br>

is asymmetric? </blockquote><div><br></div><div>yes,</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">For serial runs it doesn't seem to matter, </blockquote><div><br></div><div>yes,</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">but in<br>

parallel the PC setup hangs (after calls of<br>

PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.<br></blockquote><div><br></div><div>yep,</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class="HOEnZb"><font color="#888888"><br>

David<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

On 10/21/2017 12:10 AM, Barry Smith wrote:<br>

>   David,<br>

><br>

>    GAMG picks the number of levels based on how the coarsening process etc proceeds. You cannot hardwire it to a particular value. You can run with -info to get more info potentially on the decisions GAMG is making.<br>

><br>

>   Barry<br>

><br>

>> On Oct 20, 2017, at 2:06 PM, David Nolte <<a href="mailto:dnolte@dim.uchile.cl">dnolte@dim.uchile.cl</a>> wrote:<br>

>><br>

>> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option<br>

>> was not taken into account:<br>

>> type: gamg<br>

>>     MG: type is MULTIPLICATIVE, levels=1 cycles=v<br>

>><br>

>><br>

>><br>

>> On 10/20/2017 03:32 PM, David Nolte wrote:<br>

>>> Dear all,<br>

>>><br>

>>> I have some problems using GAMG as a preconditioner for (F)GMRES.<br>

>>> Background: I am solving the incompressible, unsteady Navier-Stokes<br>

>>> equations with a coupled mixed FEM approach, using P1/P1 elements for<br>

>>> velocity and pressure on an unstructured tetrahedron mesh with about<br>

>>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,<br>

>>> hence, no zeros on the diagonal of the pressure block. Time<br>

>>> discretization with semi-implicit backward Euler. The flow is a<br>

>>> convection dominated flow through a nozzle.<br>

>>><br>

>>> So far, for this setup, I have been quite happy with a simple FGMRES/ML<br>

>>> solver for the full system (rather bruteforce, I admit, but much faster<br>

>>> than any block/Schur preconditioners I tried):<br>

>>><br>

>>>     -ksp_converged_reason<br>

>>>     -ksp_monitor_true_residual<br>

>>>     -ksp_type fgmres<br>

>>>     -ksp_rtol 1.0e-6<br>

>>>     -ksp_initial_guess_nonzero<br>

>>><br>

>>>     -pc_type ml<br>

>>>     -pc_ml_Threshold 0.03<br>

>>>     -pc_ml_maxNlevels 3<br>

>>><br>

>>> This setup converges in ~100 iterations (see below the ksp_view output)<br>

>>> to rtol:<br>

>>><br>

>>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm<br>

>>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06<br>

>>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm<br>

>>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06<br>

>>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm<br>

>>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06<br>

>>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm<br>

>>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07<br>

>>><br>

>>><br>

>>> Now I'd like to try GAMG instead of ML. However, I don't know how to set<br>

>>> it up to get similar performance.<br>

>>> The obvious/naive<br>

>>><br>

>>>     -pc_type gamg<br>

>>>     -pc_gamg_type agg<br>

>>><br>

>>> # with and without<br>

>>>     -pc_gamg_threshold 0.03<br>

>>>     -pc_mg_levels 3<br>

>>><br>

>>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per<br>

>>> proc), for instance:<br>

>>> np = 1:<br>

>>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm<br>

>>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04<br>

>>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm<br>

>>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04<br>

>>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm<br>

>>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04<br>

>>><br>

>>> np = 8:<br>

>>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm<br>

>>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03<br>

>>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm<br>

>>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03<br>

>>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm<br>

>>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03<br>

>>><br>

>>> A very high threshold seems to improve the GAMG PC, for instance with<br>

>>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.<br>

>>> What else should I try?<br>

>>><br>

>>> I would very much appreciate any advice on configuring GAMG and<br>

>>> differences w.r.t ML to be taken into account (not a multigrid expert<br>

>>> though).<br>

>>><br>

>>> Thanks, best wishes<br>

>>> David<br>

>>><br>

>>><br>

>>> ------<br>

>>> ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75 -pc_mg_levels 3<br>

>>><br>

>>> KSP Object: 1 MPI processes<br>

>>>   type: fgmres<br>

>>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt<br>

>>> Orthogonalization with no iterative refinement<br>

>>>     GMRES: happy breakdown tolerance 1e-30<br>

>>>   maximum iterations=10000<br>

>>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.<br>

>>>   right preconditioning<br>

>>>   using nonzero initial guess<br>

>>>   using UNPRECONDITIONED norm type for convergence test<br>

>>> PC Object: 1 MPI processes<br>

>>>   type: gamg<br>

>>>     MG: type is MULTIPLICATIVE, levels=1 cycles=v<br>

>>>       Cycles per PCApply=1<br>

>>>       Using Galerkin computed coarse grid matrices<br>

>>>       GAMG specific options<br>

>>>         Threshold for dropping small values from graph 0.75<br>

>>>         AGG specific options<br>

>>>           Symmetric graph false<br>

>>>   Coarse grid solver -- level ------------------------------<wbr>-<br>

>>>     KSP Object:    (mg_levels_0_)     1 MPI processes<br>

>>>       type: preonly<br>

>>>       maximum iterations=2, initial guess is zero<br>

>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>>>       left preconditioning<br>

>>>       using NONE norm type for convergence test<br>

>>>     PC Object:    (mg_levels_0_)     1 MPI processes<br>

>>>       type: sor<br>

>>>         SOR: type = local_symmetric, iterations = 1, local iterations =<br>

>>> 1, omega = 1.<br>

>>>       linear system matrix = precond matrix:<br>

>>>       Mat Object:       1 MPI processes<br>

>>>         type: seqaij<br>

>>>         rows=1745224, cols=1745224<br>

>>>         total: nonzeros=99452608, allocated nonzeros=99452608<br>

>>>         total number of mallocs used during MatSetValues calls =0<br>

>>>           using I-node routines: found 1037847 nodes, limit used is 5<br>

>>>   linear system matrix = precond matrix:<br>

>>>   Mat Object:   1 MPI processes<br>

>>>     type: seqaij<br>

>>>     rows=1745224, cols=1745224<br>

>>>     total: nonzeros=99452608, allocated nonzeros=99452608<br>

>>>     total number of mallocs used during MatSetValues calls =0<br>

>>>       using I-node routines: found 1037847 nodes, limit used is 5<br>

>>><br>

>>><br>

>>> ------<br>

>>> ksp_view for -pc_type ml:<br>

>>><br>

>>> KSP Object: 8 MPI processes<br>

>>>   type: fgmres<br>

>>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt<br>

>>> Orthogonalization with no iterative refinement<br>

>>>     GMRES: happy breakdown tolerance 1e-30<br>

>>>   maximum iterations=10000<br>

>>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.<br>

>>>   right preconditioning<br>

>>>   using nonzero initial guess<br>

>>>   using UNPRECONDITIONED norm type for convergence test<br>

>>> PC Object: 8 MPI processes<br>

>>>   type: ml<br>

>>>     MG: type is MULTIPLICATIVE, levels=3 cycles=v<br>

>>>       Cycles per PCApply=1<br>

>>>       Using Galerkin computed coarse grid matrices<br>

>>>   Coarse grid solver -- level ------------------------------<wbr>-<br>

>>>     KSP Object:    (mg_coarse_)     8 MPI processes<br>

>>>       type: preonly<br>

>>>       maximum iterations=10000, initial guess is zero<br>

>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>>>       left preconditioning<br>

>>>       using NONE norm type for convergence test<br>

>>>     PC Object:    (mg_coarse_)     8 MPI processes<br>

>>>       type: redundant<br>

>>>         Redundant preconditioner: First (color=0) of 8 PCs follows<br>

>>>         KSP Object:        (mg_coarse_redundant_)         1 MPI processes<br>

>>>           type: preonly<br>

>>>           maximum iterations=10000, initial guess is zero<br>

>>>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>>>           left preconditioning<br>

>>>           using NONE norm type for convergence test<br>

>>>         PC Object:        (mg_coarse_redundant_)         1 MPI processes<br>

>>>           type: lu<br>

>>>             LU: out-of-place factorization<br>

>>>             tolerance for zero pivot 2.22045e-14<br>

>>>             using diagonal shift on blocks to prevent zero pivot [INBLOCKS]<br>

>>>             matrix ordering: nd<br>

>>>             factor fill ratio given 5., needed 10.4795<br>

>>>               Factored matrix follows:<br>

>>>                 Mat Object:                 1 MPI processes<br>

>>>                   type: seqaij<br>

>>>                   rows=6822, cols=6822<br>

>>>                   package used to perform factorization: petsc<br>

>>>                   total: nonzeros=9575688, allocated nonzeros=9575688<br>

>>>                   total number of mallocs used during MatSetValues calls =0<br>

>>>                     not using I-node routines<br>

>>>           linear system matrix = precond matrix:<br>

>>>           Mat Object:           1 MPI processes<br>

>>>             type: seqaij<br>

>>>             rows=6822, cols=6822<br>

>>>             total: nonzeros=913758, allocated nonzeros=913758<br>

>>>             total number of mallocs used during MatSetValues calls =0<br>

>>>               not using I-node routines<br>

>>>       linear system matrix = precond matrix:<br>

>>>       Mat Object:       8 MPI processes<br>

>>>         type: mpiaij<br>

>>>         rows=6822, cols=6822<br>

>>>         total: nonzeros=913758, allocated nonzeros=913758<br>

>>>         total number of mallocs used during MatSetValues calls =0<br>

>>>           not using I-node (on process 0) routines<br>

>>>   Down solver (pre-smoother) on level 1 ------------------------------<wbr>-<br>

>>>     KSP Object:    (mg_levels_1_)     8 MPI processes<br>

>>>       type: richardson<br>

>>>         Richardson: damping factor=1.<br>

>>>       maximum iterations=2<br>

>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>>>       left preconditioning<br>

>>>       using nonzero initial guess<br>

>>>       using NONE norm type for convergence test<br>

>>>     PC Object:    (mg_levels_1_)     8 MPI processes<br>

>>>       type: sor<br>

>>>         SOR: type = local_symmetric, iterations = 1, local iterations =<br>

>>> 1, omega = 1.<br>

>>>       linear system matrix = precond matrix:<br>

>>>       Mat Object:       8 MPI processes<br>

>>>         type: mpiaij<br>

>>>         rows=67087, cols=67087<br>

>>>         total: nonzeros=9722749, allocated nonzeros=9722749<br>

>>>         total number of mallocs used during MatSetValues calls =0<br>

>>>           not using I-node (on process 0) routines<br>

>>>   Up solver (post-smoother) same as down solver (pre-smoother)<br>

>>>   Down solver (pre-smoother) on level 2 ------------------------------<wbr>-<br>

>>>     KSP Object:    (mg_levels_2_)     8 MPI processes<br>

>>>       type: richardson<br>

>>>         Richardson: damping factor=1.<br>

>>>       maximum iterations=2<br>

>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>>>       left preconditioning<br>

>>>       using nonzero initial guess<br>

>>>       using NONE norm type for convergence test<br>

>>>     PC Object:    (mg_levels_2_)     8 MPI processes<br>

>>>       type: sor<br>

>>>         SOR: type = local_symmetric, iterations = 1, local iterations =<br>

>>> 1, omega = 1.<br>

>>>       linear system matrix = precond matrix:<br>

>>>       Mat Object:       8 MPI processes<br>

>>>         type: mpiaij<br>

>>>         rows=1745224, cols=1745224<br>

>>>         total: nonzeros=99452608, allocated nonzeros=99452608<br>

>>>         total number of mallocs used during MatSetValues calls =0<br>

>>>           using I-node (on process 0) routines: found 126690 nodes,<br>

>>> limit used is 5<br>

>>>   Up solver (post-smoother) same as down solver (pre-smoother)<br>

>>>   linear system matrix = precond matrix:<br>

>>>   Mat Object:   8 MPI processes<br>

>>>     type: mpiaij<br>

>>>     rows=1745224, cols=1745224<br>

>>>     total: nonzeros=99452608, allocated nonzeros=99452608<br>

>>>     total number of mallocs used during MatSetValues calls =0<br>

>>>       using I-node (on process 0) routines: found 126690 nodes, limit<br>

>>> used is 5<br>

>>><br>

<br>

<br>

</div></div></blockquote></div><br></div></div>