[petsc-users] GAMG advice

Thu Nov 9 13:19:38 CST 2017

Hi Mark,

thanks for clarifying.
When I wrote the initial question I had somehow overlooked the fact that
the GAMG standard smoother was Chebychev while ML uses SOR. All the
other comments concerning threshold etc were based on this mistake.

The following settings work quite well, of course LU is used on the
coarse level.

    -pc_type gamg
    -pc_gamg_type agg
    -pc_gamg_threshold 0.03
    -pc_gamg_square_graph 10        # no effect ?
    -pc_gamg_sym_graph
    -mg_levels_ksp_type richardson
    -mg_levels_pc_type sor

-pc_gamg_agg_nsmooths 0 does not seem to improve the convergence.

The ksp view now looks like this: (does this seem reasonable?)

KSP Object: 4 MPI processes
  type: fgmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=10000
  tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
  right preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: gamg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
      GAMG specific options
        Threshold for dropping small values from graph 0.03
        AGG specific options
          Symmetric graph true
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     4 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     4 MPI processes
      type: bjacobi
        block Jacobi: number of blocks = 4
        Local solve is same for all blocks, in the following KSP and PC
objects:
      KSP Object:      (mg_coarse_sub_)       1 MPI processes
        type: preonly
        maximum iterations=1, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_sub_)       1 MPI processes
        type: lu
          LU: out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
          matrix ordering: nd
          factor fill ratio given 5., needed 1.
            Factored matrix follows:
              Mat Object:               1 MPI processes
                type: seqaij
                rows=38, cols=38
                package used to perform factorization: petsc
                total: nonzeros=1444, allocated nonzeros=1444
                total number of mallocs used during MatSetValues calls =0
                  using I-node routines: found 8 nodes, limit used is 5
        linear system matrix = precond matrix:
        Mat Object:         1 MPI processes
          type: seqaij
          rows=38, cols=38
          total: nonzeros=1444, allocated nonzeros=1444
          total number of mallocs used during MatSetValues calls =0
            using I-node routines: found 8 nodes, limit used is 5
      linear system matrix = precond matrix:
      Mat Object:       4 MPI processes
        type: mpiaij
        rows=38, cols=38
        total: nonzeros=1444, allocated nonzeros=1444
        total number of mallocs used during MatSetValues calls =0
          using I-node (on process 0) routines: found 8 nodes, limit
used is 5
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     4 MPI processes
      type: richardson
        Richardson: damping factor=1.
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     4 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       4 MPI processes
        type: mpiaij
        rows=168, cols=168
        total: nonzeros=19874, allocated nonzeros=19874
        total number of mallocs used during MatSetValues calls =0
          using I-node (on process 0) routines: found 17 nodes, limit
used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     4 MPI processes
      type: richardson
        Richardson: damping factor=1.
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     4 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       4 MPI processes
        type: mpiaij
        rows=3572, cols=3572
        total: nonzeros=963872, allocated nonzeros=963872
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     4 MPI processes
      type: richardson
        Richardson: damping factor=1.
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     4 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       4 MPI processes
        type: mpiaij
        rows=21179, cols=21179
        total: nonzeros=1060605, allocated nonzeros=1060605
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     4 MPI processes
      type: richardson
        Richardson: damping factor=1.
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     4 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       4 MPI processes
        type: mpiaij
        rows=1745224, cols=1745224
        total: nonzeros=99452608, allocated nonzeros=99452608
        total number of mallocs used during MatSetValues calls =0
          using I-node (on process 0) routines: found 254433 nodes,
limit used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpiaij
    rows=1745224, cols=1745224
    total: nonzeros=99452608, allocated nonzeros=99452608
    total number of mallocs used during MatSetValues calls =0
      using I-node (on process 0) routines: found 254433 nodes, limit
used is 5

Thanks, David

On 11/08/2017 10:11 PM, Mark Adams wrote:
>
>
> On Wed, Nov 1, 2017 at 5:45 PM, David Nolte <dnolte at dim.uchile.cl
> <mailto:dnolte at dim.uchile.cl>> wrote:
>
>     Thanks Barry.
>     By simply replacing chebychev by richardson I get similar performance
>     with GAMG and ML 
>
>
> That too (I assumed you were using the same, I could not see cheby in
> your view data).
>
> I guess SOR works for the coarse grid solver because the coarse grid
> is small. It should help using lu.
>  
>
>     (GAMG even slightly faster):
>
>
> This is "random" fluctuations.
>  
>
>
>     -pc_type
>     gamg                                                                                                                                                       
>
>     -pc_gamg_type
>     agg                                                                                                                                                   
>
>     -pc_gamg_threshold
>     0.03                                                                                                                                             
>
>     -pc_gamg_square_graph 10
>     -pc_gamg_sym_graph
>     -mg_levels_ksp_type
>     richardson                                                                                                                                      
>
>     -mg_levels_pc_type sor
>
>     Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix
>     is asymmetric? 
>
>
> yes,
>  
>
>     For serial runs it doesn't seem to matter, 
>
>
> yes,
>  
>
>     but in
>     parallel the PC setup hangs (after calls of
>     PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.
>
>
> yep,
>  
>
>
>     David
>
>
>     On 10/21/2017 12:10 AM, Barry Smith wrote:
>     >   David,
>     >
>     >    GAMG picks the number of levels based on how the coarsening
>     process etc proceeds. You cannot hardwire it to a particular
>     value. You can run with -info to get more info potentially on the
>     decisions GAMG is making.
>     >
>     >   Barry
>     >
>     >> On Oct 20, 2017, at 2:06 PM, David Nolte <dnolte at dim.uchile.cl
>     <mailto:dnolte at dim.uchile.cl>> wrote:
>     >>
>     >> PS: I didn't realize at first, it looks as if the -pc_mg_levels
>     3 option
>     >> was not taken into account:
>     >> type: gamg
>     >>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
>     >>
>     >>
>     >>
>     >> On 10/20/2017 03:32 PM, David Nolte wrote:
>     >>> Dear all,
>     >>>
>     >>> I have some problems using GAMG as a preconditioner for (F)GMRES.
>     >>> Background: I am solving the incompressible, unsteady
>     Navier-Stokes
>     >>> equations with a coupled mixed FEM approach, using P1/P1
>     elements for
>     >>> velocity and pressure on an unstructured tetrahedron mesh with
>     about
>     >>> 2mio DOFs (and up to 15mio). The method is stabilized with
>     SUPG/PSPG,
>     >>> hence, no zeros on the diagonal of the pressure block. Time
>     >>> discretization with semi-implicit backward Euler. The flow is a
>     >>> convection dominated flow through a nozzle.
>     >>>
>     >>> So far, for this setup, I have been quite happy with a simple
>     FGMRES/ML
>     >>> solver for the full system (rather bruteforce, I admit, but
>     much faster
>     >>> than any block/Schur preconditioners I tried):
>     >>>
>     >>>     -ksp_converged_reason
>     >>>     -ksp_monitor_true_residual
>     >>>     -ksp_type fgmres
>     >>>     -ksp_rtol 1.0e-6
>     >>>     -ksp_initial_guess_nonzero
>     >>>
>     >>>     -pc_type ml
>     >>>     -pc_ml_Threshold 0.03
>     >>>     -pc_ml_maxNlevels 3
>     >>>
>     >>> This setup converges in ~100 iterations (see below the
>     ksp_view output)
>     >>> to rtol:
>     >>>
>     >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true
>     resid norm
>     >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
>     >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true
>     resid norm
>     >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
>     >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true
>     resid norm
>     >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
>     >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true
>     resid norm
>     >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
>     >>>
>     >>>
>     >>> Now I'd like to try GAMG instead of ML. However, I don't know
>     how to set
>     >>> it up to get similar performance.
>     >>> The obvious/naive
>     >>>
>     >>>     -pc_type gamg
>     >>>     -pc_gamg_type agg
>     >>>
>     >>> # with and without
>     >>>     -pc_gamg_threshold 0.03
>     >>>     -pc_mg_levels 3
>     >>>
>     >>> converges very slowly on 1 proc and much worse on 8 (~200k
>     dofs per
>     >>> proc), for instance:
>     >>> np = 1:
>     >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true
>     resid norm
>     >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
>     >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true
>     resid norm
>     >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
>     >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true
>     resid norm
>     >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>     >>>
>     >>> np = 8:
>     >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true
>     resid norm
>     >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
>     >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true
>     resid norm
>     >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>     >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true
>     resid norm
>     >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>     >>>
>     >>> A very high threshold seems to improve the GAMG PC, for
>     instance with
>     >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
>     >>> What else should I try?
>     >>>
>     >>> I would very much appreciate any advice on configuring GAMG and
>     >>> differences w.r.t ML to be taken into account (not a multigrid
>     expert
>     >>> though).
>     >>>
>     >>> Thanks, best wishes
>     >>> David
>     >>>
>     >>>
>     >>> ------
>     >>> ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75
>     -pc_mg_levels 3
>     >>>
>     >>> KSP Object: 1 MPI processes
>     >>>   type: fgmres
>     >>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>     >>> Orthogonalization with no iterative refinement
>     >>>     GMRES: happy breakdown tolerance 1e-30
>     >>>   maximum iterations=10000
>     >>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
>     >>>   right preconditioning
>     >>>   using nonzero initial guess
>     >>>   using UNPRECONDITIONED norm type for convergence test
>     >>> PC Object: 1 MPI processes
>     >>>   type: gamg
>     >>>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
>     >>>       Cycles per PCApply=1
>     >>>       Using Galerkin computed coarse grid matrices
>     >>>       GAMG specific options
>     >>>         Threshold for dropping small values from graph 0.75
>     >>>         AGG specific options
>     >>>           Symmetric graph false
>     >>>   Coarse grid solver -- level -------------------------------
>     >>>     KSP Object:    (mg_levels_0_)     1 MPI processes
>     >>>       type: preonly
>     >>>       maximum iterations=2, initial guess is zero
>     >>>       tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>     >>>       left preconditioning
>     >>>       using NONE norm type for convergence test
>     >>>     PC Object:    (mg_levels_0_)     1 MPI processes
>     >>>       type: sor
>     >>>         SOR: type = local_symmetric, iterations = 1, local
>     iterations =
>     >>> 1, omega = 1.
>     >>>       linear system matrix = precond matrix:
>     >>>       Mat Object:       1 MPI processes
>     >>>         type: seqaij
>     >>>         rows=1745224, cols=1745224
>     >>>         total: nonzeros=99452608, allocated nonzeros=99452608
>     >>>         total number of mallocs used during MatSetValues calls =0
>     >>>           using I-node routines: found 1037847 nodes, limit
>     used is 5
>     >>>   linear system matrix = precond matrix:
>     >>>   Mat Object:   1 MPI processes
>     >>>     type: seqaij
>     >>>     rows=1745224, cols=1745224
>     >>>     total: nonzeros=99452608, allocated nonzeros=99452608
>     >>>     total number of mallocs used during MatSetValues calls =0
>     >>>       using I-node routines: found 1037847 nodes, limit used is 5
>     >>>
>     >>>
>     >>> ------
>     >>> ksp_view for -pc_type ml:
>     >>>
>     >>> KSP Object: 8 MPI processes
>     >>>   type: fgmres
>     >>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>     >>> Orthogonalization with no iterative refinement
>     >>>     GMRES: happy breakdown tolerance 1e-30
>     >>>   maximum iterations=10000
>     >>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
>     >>>   right preconditioning
>     >>>   using nonzero initial guess
>     >>>   using UNPRECONDITIONED norm type for convergence test
>     >>> PC Object: 8 MPI processes
>     >>>   type: ml
>     >>>     MG: type is MULTIPLICATIVE, levels=3 cycles=v
>     >>>       Cycles per PCApply=1
>     >>>       Using Galerkin computed coarse grid matrices
>     >>>   Coarse grid solver -- level -------------------------------
>     >>>     KSP Object:    (mg_coarse_)     8 MPI processes
>     >>>       type: preonly
>     >>>       maximum iterations=10000, initial guess is zero
>     >>>       tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>     >>>       left preconditioning
>     >>>       using NONE norm type for convergence test
>     >>>     PC Object:    (mg_coarse_)     8 MPI processes
>     >>>       type: redundant
>     >>>         Redundant preconditioner: First (color=0) of 8 PCs follows
>     >>>         KSP Object:        (mg_coarse_redundant_)         1
>     MPI processes
>     >>>           type: preonly
>     >>>           maximum iterations=10000, initial guess is zero
>     >>>           tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>     >>>           left preconditioning
>     >>>           using NONE norm type for convergence test
>     >>>         PC Object:        (mg_coarse_redundant_)         1 MPI
>     processes
>     >>>           type: lu
>     >>>             LU: out-of-place factorization
>     >>>             tolerance for zero pivot 2.22045e-14
>     >>>             using diagonal shift on blocks to prevent zero
>     pivot [INBLOCKS]
>     >>>             matrix ordering: nd
>     >>>             factor fill ratio given 5., needed 10.4795
>     >>>               Factored matrix follows:
>     >>>                 Mat Object:                 1 MPI processes
>     >>>                   type: seqaij
>     >>>                   rows=6822, cols=6822
>     >>>                   package used to perform factorization: petsc
>     >>>                   total: nonzeros=9575688, allocated
>     nonzeros=9575688
>     >>>                   total number of mallocs used during
>     MatSetValues calls =0
>     >>>                     not using I-node routines
>     >>>           linear system matrix = precond matrix:
>     >>>           Mat Object:           1 MPI processes
>     >>>             type: seqaij
>     >>>             rows=6822, cols=6822
>     >>>             total: nonzeros=913758, allocated nonzeros=913758
>     >>>             total number of mallocs used during MatSetValues
>     calls =0
>     >>>               not using I-node routines
>     >>>       linear system matrix = precond matrix:
>     >>>       Mat Object:       8 MPI processes
>     >>>         type: mpiaij
>     >>>         rows=6822, cols=6822
>     >>>         total: nonzeros=913758, allocated nonzeros=913758
>     >>>         total number of mallocs used during MatSetValues calls =0
>     >>>           not using I-node (on process 0) routines
>     >>>   Down solver (pre-smoother) on level 1
>     -------------------------------
>     >>>     KSP Object:    (mg_levels_1_)     8 MPI processes
>     >>>       type: richardson
>     >>>         Richardson: damping factor=1.
>     >>>       maximum iterations=2
>     >>>       tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>     >>>       left preconditioning
>     >>>       using nonzero initial guess
>     >>>       using NONE norm type for convergence test
>     >>>     PC Object:    (mg_levels_1_)     8 MPI processes
>     >>>       type: sor
>     >>>         SOR: type = local_symmetric, iterations = 1, local
>     iterations =
>     >>> 1, omega = 1.
>     >>>       linear system matrix = precond matrix:
>     >>>       Mat Object:       8 MPI processes
>     >>>         type: mpiaij
>     >>>         rows=67087, cols=67087
>     >>>         total: nonzeros=9722749, allocated nonzeros=9722749
>     >>>         total number of mallocs used during MatSetValues calls =0
>     >>>           not using I-node (on process 0) routines
>     >>>   Up solver (post-smoother) same as down solver (pre-smoother)
>     >>>   Down solver (pre-smoother) on level 2
>     -------------------------------
>     >>>     KSP Object:    (mg_levels_2_)     8 MPI processes
>     >>>       type: richardson
>     >>>         Richardson: damping factor=1.
>     >>>       maximum iterations=2
>     >>>       tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>     >>>       left preconditioning
>     >>>       using nonzero initial guess
>     >>>       using NONE norm type for convergence test
>     >>>     PC Object:    (mg_levels_2_)     8 MPI processes
>     >>>       type: sor
>     >>>         SOR: type = local_symmetric, iterations = 1, local
>     iterations =
>     >>> 1, omega = 1.
>     >>>       linear system matrix = precond matrix:
>     >>>       Mat Object:       8 MPI processes
>     >>>         type: mpiaij
>     >>>         rows=1745224, cols=1745224
>     >>>         total: nonzeros=99452608, allocated nonzeros=99452608
>     >>>         total number of mallocs used during MatSetValues calls =0
>     >>>           using I-node (on process 0) routines: found 126690
>     nodes,
>     >>> limit used is 5
>     >>>   Up solver (post-smoother) same as down solver (pre-smoother)
>     >>>   linear system matrix = precond matrix:
>     >>>   Mat Object:   8 MPI processes
>     >>>     type: mpiaij
>     >>>     rows=1745224, cols=1745224
>     >>>     total: nonzeros=99452608, allocated nonzeros=99452608
>     >>>     total number of mallocs used during MatSetValues calls =0
>     >>>       using I-node (on process 0) routines: found 126690
>     nodes, limit
>     >>> used is 5
>     >>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171109/a6180cb1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171109/a6180cb1/attachment-0001.sig>