# Smoother settings for AMG

Barry Smith bsmith at mcs.anl.gov
Thu Jul 30 18:45:19 CDT 2009

Harun,

I have played around with this matrix. It is a nasty matrix; I
think it is really beyond the normal capacity of ML (and hypre's
boomerAMG).

Even the "convergence" you were getting below is BOGUS.  If you run
with -ksp_norm_type unpreconditioned or -ksp_monitor_true_residual
you'll see that the "true" residual norm is actually creeping to zero
and at the converged 43 iterations below the true residual norm has
decreased by like less than 1/10. (The preconditioned residual norm
has decreased by 1.e 5 so the iteration stops and you think it has
converged. In really hard problems preconditioners sometimes scales
things in a funky way so a large decrease in preconditioned residual
norm does not mean a large decrease in true residual norm). In other
words the "answer" you got out of the runs below is garbage.

I suggest,
1) check carefully that the matrix being created actually matches the
model's equations, if they seem right then
2) see if you can change the model so it does not generate such
hopeless matrices. If you MUST solve this nasty matrix
3) bite the bullet and use a parallel direct solver from PETSc. Try
both MUMPS and SuperLU_dist

Good luck,

Barry

On Jul 29, 2009, at 3:54 PM, BAYRAKTAR Harun wrote:

> Hi,
>
> I am trying to solve a system of equations and I am having difficulty
> picking the right smoothers for AMG (using ML as pc_type) in PETSc for
> parallel execution. First here is what happens in terms of CG
> (ksp_type)
> iteration counts (both columns use block jacobi):
>
> cpus	|	AMG w/ ICC(0) x1	|	AMG w/ SOR x4
> ------------------------------------------------------
> 1	|		43		|		243
> 4	|		699		|		379
>
> x1 or x4 means 1 or 4 iterations of smoother application at each AMG
> level (all details from ksp view for the 4 cpu run are below). The
> main
> observation is that on 1 cpu, AMG w/ ICC(0) is a clear winner but
> falls
> apart in parallel. SOR on the other hand experiences a 1.5X increase
> in
> iteration count which is totally expected from the quality of
> coarsening
> ML delivers in parallel.
>
> I basically would like to find a way (if possible) to have the
> number of
> iterations in parallel stay with 1-2X of 1 cpu iteration count for the
> AMG w/ ICC case. Is there a way to achieve this?
>
> Thanks,
> Harun
>
> %%%%%%%%%%%%%%%%%%%%%%%%%
> AMG w/ ICC(0) x1 ksp_view
> %%%%%%%%%%%%%%%%%%%%%%%%%
> KSP Object:
>  type: cg
>  maximum iterations=10000
>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>  left preconditioning
> PC Object:
>  type: ml
>    MG: type is MULTIPLICATIVE, levels=3 cycles=v, pre-smooths=1,
> post-smooths=1
>  Coarse gride solver -- level 0 -------------------------------
>    KSP Object:(mg_coarse_)
>      type: preonly
>      maximum iterations=1, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_coarse_)
>      type: redundant
>        Redundant preconditioner: First (color=0) of 4 PCs follows
>      KSP Object:(mg_coarse_redundant_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_coarse_redundant_)
>        type: lu
>          LU: out-of-place factorization
>            matrix ordering: nd
>          LU: tolerance for zero pivot 1e-12
>          LU: factor fill ratio needed 2.17227
>               Factored matrix follows
>              Matrix Object:
>                type=seqaij, rows=283, cols=283
>                total: nonzeros=21651, allocated nonzeros=21651
>                  using I-node routines: found 186 nodes, limit used is
> 5
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=283, cols=283
>          total: nonzeros=9967, allocated nonzeros=14150
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=283, cols=283
>        total: nonzeros=9967, allocated nonzeros=9967
>          not using I-node (on process 0) routines
>  Down solver (pre-smoother) on level 1 -------------------------------
>    KSP Object:(mg_levels_1_)
>      type: richardson
>        Richardson: damping factor=0.9
>      maximum iterations=1, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_1_)
>      type: bjacobi
>        block Jacobi: number of blocks = 4
>        Local solve is same for all blocks, in the following KSP and PC
> objects:
>      KSP Object:(mg_levels_1_sub_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_levels_1_sub_)
>        type: icc
>          ICC: 0 levels of fill
>          ICC: factor fill ratio allocated 1
>          ICC: using Manteuffel shift
>          ICC: factor fill ratio needed 0.514899
>               Factored matrix follows
>              Matrix Object:
>                type=seqsbaij, rows=2813, cols=2813
>                total: nonzeros=48609, allocated nonzeros=48609
>                    block size is 1
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=2813, cols=2813
>          total: nonzeros=94405, allocated nonzeros=94405
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=10654, cols=10654
>        total: nonzeros=376634, allocated nonzeros=376634
>          not using I-node (on process 0) routines
>  Up solver (post-smoother) on level 1 -------------------------------
>    KSP Object:(mg_levels_1_)
>      type: richardson
>        Richardson: damping factor=0.9
>      maximum iterations=1
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_1_)
>      type: bjacobi
>        block Jacobi: number of blocks = 4
>        Local solve is same for all blocks, in the following KSP and PC
> objects:
>      KSP Object:(mg_levels_1_sub_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_levels_1_sub_)
>        type: icc
>          ICC: 0 levels of fill
>          ICC: factor fill ratio allocated 1
>          ICC: using Manteuffel shift
>          ICC: factor fill ratio needed 0.514899
>               Factored matrix follows
>              Matrix Object:
>                type=seqsbaij, rows=2813, cols=2813
>                total: nonzeros=48609, allocated nonzeros=48609
>                    block size is 1
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=2813, cols=2813
>          total: nonzeros=94405, allocated nonzeros=94405
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=10654, cols=10654
>        total: nonzeros=376634, allocated nonzeros=376634
>          not using I-node (on process 0) routines
>  Down solver (pre-smoother) on level 2 -------------------------------
>    KSP Object:(mg_levels_2_)
>      type: richardson
>        Richardson: damping factor=0.9
>      maximum iterations=1, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_2_)
>      type: bjacobi
>        block Jacobi: number of blocks = 4
>        Local solve is same for all blocks, in the following KSP and PC
> objects:
>      KSP Object:(mg_levels_2_sub_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_levels_2_sub_)
>        type: icc
>          ICC: 0 levels of fill
>          ICC: factor fill ratio allocated 1
>          ICC: using Manteuffel shift
>          ICC: factor fill ratio needed 0.519045
>               Factored matrix follows
>              Matrix Object:
>                type=seqsbaij, rows=101164, cols=101164
>                total: nonzeros=1378558, allocated nonzeros=1378558
>                    block size is 1
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=101164, cols=101164
>          total: nonzeros=2655952, allocated nonzeros=5159364
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=411866, cols=411866
>        total: nonzeros=10941434, allocated nonzeros=42010332
>          not using I-node (on process 0) routines
>  Up solver (post-smoother) on level 2 -------------------------------
>    KSP Object:(mg_levels_2_)
>      type: richardson
>        Richardson: damping factor=0.9
>      maximum iterations=1
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_2_)
>      type: bjacobi
>        block Jacobi: number of blocks = 4
>        Local solve is same for all blocks, in the following KSP and PC
> objects:
>      KSP Object:(mg_levels_2_sub_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_levels_2_sub_)
>        type: icc
>          ICC: 0 levels of fill
>          ICC: factor fill ratio allocated 1
>          ICC: using Manteuffel shift
>          ICC: factor fill ratio needed 0.519045
>               Factored matrix follows
>              Matrix Object:
>                type=seqsbaij, rows=101164, cols=101164
>                total: nonzeros=1378558, allocated nonzeros=1378558
>                    block size is 1
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=101164, cols=101164
>          total: nonzeros=2655952, allocated nonzeros=5159364
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=411866, cols=411866
>        total: nonzeros=10941434, allocated nonzeros=42010332
>          not using I-node (on process 0) routines
>  linear system matrix = precond matrix:
>  Matrix Object:
>    type=mpiaij, rows=411866, cols=411866
>    total: nonzeros=10941434, allocated nonzeros=42010332
>      not using I-node (on process 0) routines
>
> %%%%%%%%%%%%%%%%%%%%%%
> AMG w/ SOR x4 ksp_view
> %%%%%%%%%%%%%%%%%%%%%%
>
> KSP Object:
>  type: cg
>  maximum iterations=10000
>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>  left preconditioning
> PC Object:
>  type: ml
>    MG: type is MULTIPLICATIVE, levels=3 cycles=v, pre-smooths=1,
> post-smooths=1
>  Coarse gride solver -- level 0 -------------------------------
>    KSP Object:(mg_coarse_)
>      type: preonly
>      maximum iterations=1, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_coarse_)
>      type: redundant
>        Redundant preconditioner: First (color=0) of 4 PCs follows
>      KSP Object:(mg_coarse_redundant_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_coarse_redundant_)
>        type: lu
>          LU: out-of-place factorization
>            matrix ordering: nd
>          LU: tolerance for zero pivot 1e-12
>          LU: factor fill ratio needed 2.17227
>               Factored matrix follows
>              Matrix Object:
>                type=seqaij, rows=283, cols=283
>                total: nonzeros=21651, allocated nonzeros=21651
>                  using I-node routines: found 186 nodes, limit used is
> 5
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=283, cols=283
>          total: nonzeros=9967, allocated nonzeros=14150
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=283, cols=283
>        total: nonzeros=9967, allocated nonzeros=9967
>          not using I-node (on process 0) routines
>  Down solver (pre-smoother) on level 1 -------------------------------
>    KSP Object:(mg_levels_1_)
>      type: richardson
>        Richardson: damping factor=1
>      maximum iterations=4, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_1_)
>      type: sor
>        SOR: type = local_symmetric, iterations = 1, omega = 1
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=10654, cols=10654
>        total: nonzeros=376634, allocated nonzeros=376634
>          not using I-node (on process 0) routines
>  Up solver (post-smoother) on level 1 -------------------------------
>    KSP Object:(mg_levels_1_)
>      type: richardson
>        Richardson: damping factor=1
>      maximum iterations=4
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_1_)
>      type: sor
>        SOR: type = local_symmetric, iterations = 1, omega = 1
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=10654, cols=10654
>        total: nonzeros=376634, allocated nonzeros=376634
>          not using I-node (on process 0) routines
>  Down solver (pre-smoother) on level 2 -------------------------------
>    KSP Object:(mg_levels_2_)
>      type: richardson
>        Richardson: damping factor=1
>      maximum iterations=4, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_2_)
>      type: sor
>        SOR: type = local_symmetric, iterations = 1, omega = 1
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=411866, cols=411866
>        total: nonzeros=10941434, allocated nonzeros=42010332
>          not using I-node (on process 0) routines
>  Up solver (post-smoother) on level 2 -------------------------------
>    KSP Object:(mg_levels_2_)
>      type: richardson
>        Richardson: damping factor=1
>      maximum iterations=4
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_2_)
>      type: sor
>        SOR: type = local_symmetric, iterations = 1, omega = 1
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=411866, cols=411866
>        total: nonzeros=10941434, allocated nonzeros=42010332
>          not using I-node (on process 0) routines
>  linear system matrix = precond matrix:
>  Matrix Object:
>    type=mpiaij, rows=411866, cols=411866
>    total: nonzeros=10941434, allocated nonzeros=42010332
>      not using I-node (on process 0) routines
>
>