Smoother settings for AMG

Thu Jul 30 18:45:19 CDT 2009

    Harun,

   I have played around with this matrix. It is a nasty matrix; I  
think it is really beyond the normal capacity of ML (and hypre's  
boomerAMG).

Even the "convergence" you were getting below is BOGUS.  If you run  
with -ksp_norm_type unpreconditioned or -ksp_monitor_true_residual  
you'll see that the "true" residual norm is actually creeping to zero  
and at the converged 43 iterations below the true residual norm has  
decreased by like less than 1/10. (The preconditioned residual norm  
has decreased by 1.e 5 so the iteration stops and you think it has  
converged. In really hard problems preconditioners sometimes scales  
things in a funky way so a large decrease in preconditioned residual  
norm does not mean a large decrease in true residual norm). In other  
words the "answer" you got out of the runs below is garbage.

   I suggest,
1) check carefully that the matrix being created actually matches the  
model's equations, if they seem right then
2) see if you can change the model so it does not generate such  
hopeless matrices. If you MUST solve this nasty matrix
3) bite the bullet and use a parallel direct solver from PETSc. Try  
both MUMPS and SuperLU_dist

   Good luck,

    Barry

On Jul 29, 2009, at 3:54 PM, BAYRAKTAR Harun wrote:

> Hi,
>
> I am trying to solve a system of equations and I am having difficulty
> picking the right smoothers for AMG (using ML as pc_type) in PETSc for
> parallel execution. First here is what happens in terms of CG  
> (ksp_type)
> iteration counts (both columns use block jacobi):
>
> cpus	|	AMG w/ ICC(0) x1	|	AMG w/ SOR x4
> ------------------------------------------------------
> 1	|		43		|		243
> 4	|		699		|		379
>
> x1 or x4 means 1 or 4 iterations of smoother application at each AMG
> level (all details from ksp view for the 4 cpu run are below). The  
> main
> observation is that on 1 cpu, AMG w/ ICC(0) is a clear winner but  
> falls
> apart in parallel. SOR on the other hand experiences a 1.5X increase  
> in
> iteration count which is totally expected from the quality of  
> coarsening
> ML delivers in parallel.
>
> I basically would like to find a way (if possible) to have the  
> number of
> iterations in parallel stay with 1-2X of 1 cpu iteration count for the
> AMG w/ ICC case. Is there a way to achieve this?
>
> Thanks,
> Harun
>
> %%%%%%%%%%%%%%%%%%%%%%%%%
> AMG w/ ICC(0) x1 ksp_view
> %%%%%%%%%%%%%%%%%%%%%%%%%
> KSP Object:
>  type: cg
>  maximum iterations=10000
>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>  left preconditioning
> PC Object:
>  type: ml
>    MG: type is MULTIPLICATIVE, levels=3 cycles=v, pre-smooths=1,
> post-smooths=1
>  Coarse gride solver -- level 0 -------------------------------
>    KSP Object:(mg_coarse_)
>      type: preonly
>      maximum iterations=1, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_coarse_)
>      type: redundant
>        Redundant preconditioner: First (color=0) of 4 PCs follows
>      KSP Object:(mg_coarse_redundant_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_coarse_redundant_)
>        type: lu
>          LU: out-of-place factorization
>            matrix ordering: nd
>          LU: tolerance for zero pivot 1e-12
>          LU: factor fill ratio needed 2.17227
>               Factored matrix follows
>              Matrix Object:
>                type=seqaij, rows=283, cols=283
>                total: nonzeros=21651, allocated nonzeros=21651
>                  using I-node routines: found 186 nodes, limit used is
> 5
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=283, cols=283
>          total: nonzeros=9967, allocated nonzeros=14150
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=283, cols=283
>        total: nonzeros=9967, allocated nonzeros=9967
>          not using I-node (on process 0) routines
>  Down solver (pre-smoother) on level 1 -------------------------------
>    KSP Object:(mg_levels_1_)
>      type: richardson
>        Richardson: damping factor=0.9
>      maximum iterations=1, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_1_)
>      type: bjacobi
>        block Jacobi: number of blocks = 4
>        Local solve is same for all blocks, in the following KSP and PC
> objects:
>      KSP Object:(mg_levels_1_sub_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_levels_1_sub_)
>        type: icc
>          ICC: 0 levels of fill
>          ICC: factor fill ratio allocated 1
>          ICC: using Manteuffel shift
>          ICC: factor fill ratio needed 0.514899
>               Factored matrix follows
>              Matrix Object:
>                type=seqsbaij, rows=2813, cols=2813
>                total: nonzeros=48609, allocated nonzeros=48609
>                    block size is 1
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=2813, cols=2813
>          total: nonzeros=94405, allocated nonzeros=94405
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=10654, cols=10654
>        total: nonzeros=376634, allocated nonzeros=376634
>          not using I-node (on process 0) routines
>  Up solver (post-smoother) on level 1 -------------------------------
>    KSP Object:(mg_levels_1_)
>      type: richardson
>        Richardson: damping factor=0.9
>      maximum iterations=1
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_1_)
>      type: bjacobi
>        block Jacobi: number of blocks = 4
>        Local solve is same for all blocks, in the following KSP and PC
> objects:
>      KSP Object:(mg_levels_1_sub_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_levels_1_sub_)
>        type: icc
>          ICC: 0 levels of fill
>          ICC: factor fill ratio allocated 1
>          ICC: using Manteuffel shift
>          ICC: factor fill ratio needed 0.514899
>               Factored matrix follows
>              Matrix Object:
>                type=seqsbaij, rows=2813, cols=2813
>                total: nonzeros=48609, allocated nonzeros=48609
>                    block size is 1
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=2813, cols=2813
>          total: nonzeros=94405, allocated nonzeros=94405
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=10654, cols=10654
>        total: nonzeros=376634, allocated nonzeros=376634
>          not using I-node (on process 0) routines
>  Down solver (pre-smoother) on level 2 -------------------------------
>    KSP Object:(mg_levels_2_)
>      type: richardson
>        Richardson: damping factor=0.9
>      maximum iterations=1, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_2_)
>      type: bjacobi
>        block Jacobi: number of blocks = 4
>        Local solve is same for all blocks, in the following KSP and PC
> objects:
>      KSP Object:(mg_levels_2_sub_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_levels_2_sub_)
>        type: icc
>          ICC: 0 levels of fill
>          ICC: factor fill ratio allocated 1
>          ICC: using Manteuffel shift
>          ICC: factor fill ratio needed 0.519045
>               Factored matrix follows
>              Matrix Object:
>                type=seqsbaij, rows=101164, cols=101164
>                total: nonzeros=1378558, allocated nonzeros=1378558
>                    block size is 1
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=101164, cols=101164
>          total: nonzeros=2655952, allocated nonzeros=5159364
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=411866, cols=411866
>        total: nonzeros=10941434, allocated nonzeros=42010332
>          not using I-node (on process 0) routines
>  Up solver (post-smoother) on level 2 -------------------------------
>    KSP Object:(mg_levels_2_)
>      type: richardson
>        Richardson: damping factor=0.9
>      maximum iterations=1
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_2_)
>      type: bjacobi
>        block Jacobi: number of blocks = 4
>        Local solve is same for all blocks, in the following KSP and PC
> objects:
>      KSP Object:(mg_levels_2_sub_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_levels_2_sub_)
>        type: icc
>          ICC: 0 levels of fill
>          ICC: factor fill ratio allocated 1
>          ICC: using Manteuffel shift
>          ICC: factor fill ratio needed 0.519045
>               Factored matrix follows
>              Matrix Object:
>                type=seqsbaij, rows=101164, cols=101164
>                total: nonzeros=1378558, allocated nonzeros=1378558
>                    block size is 1
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=101164, cols=101164
>          total: nonzeros=2655952, allocated nonzeros=5159364
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=411866, cols=411866
>        total: nonzeros=10941434, allocated nonzeros=42010332
>          not using I-node (on process 0) routines
>  linear system matrix = precond matrix:
>  Matrix Object:
>    type=mpiaij, rows=411866, cols=411866
>    total: nonzeros=10941434, allocated nonzeros=42010332
>      not using I-node (on process 0) routines
>
> %%%%%%%%%%%%%%%%%%%%%%
> AMG w/ SOR x4 ksp_view
> %%%%%%%%%%%%%%%%%%%%%%
>
> KSP Object:
>  type: cg
>  maximum iterations=10000
>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>  left preconditioning
> PC Object:
>  type: ml
>    MG: type is MULTIPLICATIVE, levels=3 cycles=v, pre-smooths=1,
> post-smooths=1
>  Coarse gride solver -- level 0 -------------------------------
>    KSP Object:(mg_coarse_)
>      type: preonly
>      maximum iterations=1, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_coarse_)
>      type: redundant
>        Redundant preconditioner: First (color=0) of 4 PCs follows
>      KSP Object:(mg_coarse_redundant_)
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>      PC Object:(mg_coarse_redundant_)
>        type: lu
>          LU: out-of-place factorization
>            matrix ordering: nd
>          LU: tolerance for zero pivot 1e-12
>          LU: factor fill ratio needed 2.17227
>               Factored matrix follows
>              Matrix Object:
>                type=seqaij, rows=283, cols=283
>                total: nonzeros=21651, allocated nonzeros=21651
>                  using I-node routines: found 186 nodes, limit used is
> 5
>        linear system matrix = precond matrix:
>        Matrix Object:
>          type=seqaij, rows=283, cols=283
>          total: nonzeros=9967, allocated nonzeros=14150
>            not using I-node routines
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=283, cols=283
>        total: nonzeros=9967, allocated nonzeros=9967
>          not using I-node (on process 0) routines
>  Down solver (pre-smoother) on level 1 -------------------------------
>    KSP Object:(mg_levels_1_)
>      type: richardson
>        Richardson: damping factor=1
>      maximum iterations=4, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_1_)
>      type: sor
>        SOR: type = local_symmetric, iterations = 1, omega = 1
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=10654, cols=10654
>        total: nonzeros=376634, allocated nonzeros=376634
>          not using I-node (on process 0) routines
>  Up solver (post-smoother) on level 1 -------------------------------
>    KSP Object:(mg_levels_1_)
>      type: richardson
>        Richardson: damping factor=1
>      maximum iterations=4
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_1_)
>      type: sor
>        SOR: type = local_symmetric, iterations = 1, omega = 1
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=10654, cols=10654
>        total: nonzeros=376634, allocated nonzeros=376634
>          not using I-node (on process 0) routines
>  Down solver (pre-smoother) on level 2 -------------------------------
>    KSP Object:(mg_levels_2_)
>      type: richardson
>        Richardson: damping factor=1
>      maximum iterations=4, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_2_)
>      type: sor
>        SOR: type = local_symmetric, iterations = 1, omega = 1
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=411866, cols=411866
>        total: nonzeros=10941434, allocated nonzeros=42010332
>          not using I-node (on process 0) routines
>  Up solver (post-smoother) on level 2 -------------------------------
>    KSP Object:(mg_levels_2_)
>      type: richardson
>        Richardson: damping factor=1
>      maximum iterations=4
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>    PC Object:(mg_levels_2_)
>      type: sor
>        SOR: type = local_symmetric, iterations = 1, omega = 1
>      linear system matrix = precond matrix:
>      Matrix Object:
>        type=mpiaij, rows=411866, cols=411866
>        total: nonzeros=10941434, allocated nonzeros=42010332
>          not using I-node (on process 0) routines
>  linear system matrix = precond matrix:
>  Matrix Object:
>    type=mpiaij, rows=411866, cols=411866
>    total: nonzeros=10941434, allocated nonzeros=42010332
>      not using I-node (on process 0) routines
>
>