Smoother settings for AMG

Fri Jul 31 13:55:49 CDT 2009

Barry,

On Monday I'll use ex10.c to reproduce and send you the full options.

Thanks,
Harun

-----Original Message-----
From: petsc-users-bounces at mcs.anl.gov [mailto:petsc-users-bounces at mcs.anl.gov] On Behalf Of Barry Smith
Sent: Friday, July 31, 2009 2:25 PM
To: PETSc users list
Subject: Re: Smoother settings for AMG

On Jul 31, 2009, at 1:15 PM, BAYRAKTAR Harun wrote:

> Barry,
>
> Thanks a lot for looking in to this. One thing I want to clarify is  
> that the 43 (should have been 46 sorry for the typo) iterations on 1  
> cpu seems like a real convergence to me. I do look at the  
> unpreconditioned residual norm to determine convergence. For this I  
> use:
>
> ierr = KSPSetNormType(m_solver, KSP_NORM_UNPRECONDITIONED);   
> CHKERRQ(ierr);
>
> Then I check convergence through KSPSetConvergenceTest. As an  
> experiment I commented out the line above where I tell KSP to use  
> the unpreconditioned norm and while the ||r|| values changed  
> (naturally), it still converged in slightly more number of  
> iterations (56).
>
> I am familiar with the preconditioned norm going down 6 orders while  
> the true relative norm is 0.1 or so (i.e., problem not solved at  
> all). This usually happens to me in structural mechanics problems  
> with ill conditioned systems and I use a KSP method that does not  
> allow for the unpreconditioned residual to be monitored. However,  
> this does not seem to be one of those cases though, maybe I am  
> missing something.

Ok. I didn't see what you report (I saw it just iterating away for a  
long time with the unpreconditioned norm)  but then you never sent the  
command line options for the solver you used so I may have run it  
differently.

>
> Out of curiosity did you use ksp/ksp/examples/tutorials/ex10.c to  
> solve this?

Yes.

>
> Thanks again,
> Harun
>
>
>
> -----Original Message-----
> From: petsc-users-bounces at mcs.anl.gov [mailto:petsc-users-bounces at mcs.anl.gov 
> ] On Behalf Of Barry Smith
> Sent: Thursday, July 30, 2009 7:45 PM
> To: PETSc users list
> Subject: Re: Smoother settings for AMG
>
>
>    Harun,
>
>   I have played around with this matrix. It is a nasty matrix; I
> think it is really beyond the normal capacity of ML (and hypre's
> boomerAMG).
>
> Even the "convergence" you were getting below is BOGUS.  If you run
> with -ksp_norm_type unpreconditioned or -ksp_monitor_true_residual
> you'll see that the "true" residual norm is actually creeping to zero
> and at the converged 43 iterations below the true residual norm has
> decreased by like less than 1/10. (The preconditioned residual norm
> has decreased by 1.e 5 so the iteration stops and you think it has
> converged. In really hard problems preconditioners sometimes scales
> things in a funky way so a large decrease in preconditioned residual
> norm does not mean a large decrease in true residual norm). In other
> words the "answer" you got out of the runs below is garbage.
>
>   I suggest,
> 1) check carefully that the matrix being created actually matches the
> model's equations, if they seem right then
> 2) see if you can change the model so it does not generate such
> hopeless matrices. If you MUST solve this nasty matrix
> 3) bite the bullet and use a parallel direct solver from PETSc. Try
> both MUMPS and SuperLU_dist
>
>   Good luck,
>
>    Barry
>
>
>
>
> On Jul 29, 2009, at 3:54 PM, BAYRAKTAR Harun wrote:
>
>> Hi,
>>
>> I am trying to solve a system of equations and I am having difficulty
>> picking the right smoothers for AMG (using ML as pc_type) in PETSc  
>> for
>> parallel execution. First here is what happens in terms of CG
>> (ksp_type)
>> iteration counts (both columns use block jacobi):
>>
>> cpus	|	AMG w/ ICC(0) x1	|	AMG w/ SOR x4
>> ------------------------------------------------------
>> 1	|		43		|		243
>> 4	|		699		|		379
>>
>> x1 or x4 means 1 or 4 iterations of smoother application at each AMG
>> level (all details from ksp view for the 4 cpu run are below). The
>> main
>> observation is that on 1 cpu, AMG w/ ICC(0) is a clear winner but
>> falls
>> apart in parallel. SOR on the other hand experiences a 1.5X increase
>> in
>> iteration count which is totally expected from the quality of
>> coarsening
>> ML delivers in parallel.
>>
>> I basically would like to find a way (if possible) to have the
>> number of
>> iterations in parallel stay with 1-2X of 1 cpu iteration count for  
>> the
>> AMG w/ ICC case. Is there a way to achieve this?
>>
>> Thanks,
>> Harun
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%%
>> AMG w/ ICC(0) x1 ksp_view
>> %%%%%%%%%%%%%%%%%%%%%%%%%
>> KSP Object:
>> type: cg
>> maximum iterations=10000
>> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>> left preconditioning
>> PC Object:
>> type: ml
>>   MG: type is MULTIPLICATIVE, levels=3 cycles=v, pre-smooths=1,
>> post-smooths=1
>> Coarse gride solver -- level 0 -------------------------------
>>   KSP Object:(mg_coarse_)
>>     type: preonly
>>     maximum iterations=1, initial guess is zero
>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>     left preconditioning
>>   PC Object:(mg_coarse_)
>>     type: redundant
>>       Redundant preconditioner: First (color=0) of 4 PCs follows
>>     KSP Object:(mg_coarse_redundant_)
>>       type: preonly
>>       maximum iterations=10000, initial guess is zero
>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>       left preconditioning
>>     PC Object:(mg_coarse_redundant_)
>>       type: lu
>>         LU: out-of-place factorization
>>           matrix ordering: nd
>>         LU: tolerance for zero pivot 1e-12
>>         LU: factor fill ratio needed 2.17227
>>              Factored matrix follows
>>             Matrix Object:
>>               type=seqaij, rows=283, cols=283
>>               total: nonzeros=21651, allocated nonzeros=21651
>>                 using I-node routines: found 186 nodes, limit used is
>> 5
>>       linear system matrix = precond matrix:
>>       Matrix Object:
>>         type=seqaij, rows=283, cols=283
>>         total: nonzeros=9967, allocated nonzeros=14150
>>           not using I-node routines
>>     linear system matrix = precond matrix:
>>     Matrix Object:
>>       type=mpiaij, rows=283, cols=283
>>       total: nonzeros=9967, allocated nonzeros=9967
>>         not using I-node (on process 0) routines
>> Down solver (pre-smoother) on level 1 -------------------------------
>>   KSP Object:(mg_levels_1_)
>>     type: richardson
>>       Richardson: damping factor=0.9
>>     maximum iterations=1, initial guess is zero
>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>     left preconditioning
>>   PC Object:(mg_levels_1_)
>>     type: bjacobi
>>       block Jacobi: number of blocks = 4
>>       Local solve is same for all blocks, in the following KSP and PC
>> objects:
>>     KSP Object:(mg_levels_1_sub_)
>>       type: preonly
>>       maximum iterations=10000, initial guess is zero
>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>       left preconditioning
>>     PC Object:(mg_levels_1_sub_)
>>       type: icc
>>         ICC: 0 levels of fill
>>         ICC: factor fill ratio allocated 1
>>         ICC: using Manteuffel shift
>>         ICC: factor fill ratio needed 0.514899
>>              Factored matrix follows
>>             Matrix Object:
>>               type=seqsbaij, rows=2813, cols=2813
>>               total: nonzeros=48609, allocated nonzeros=48609
>>                   block size is 1
>>       linear system matrix = precond matrix:
>>       Matrix Object:
>>         type=seqaij, rows=2813, cols=2813
>>         total: nonzeros=94405, allocated nonzeros=94405
>>           not using I-node routines
>>     linear system matrix = precond matrix:
>>     Matrix Object:
>>       type=mpiaij, rows=10654, cols=10654
>>       total: nonzeros=376634, allocated nonzeros=376634
>>         not using I-node (on process 0) routines
>> Up solver (post-smoother) on level 1 -------------------------------
>>   KSP Object:(mg_levels_1_)
>>     type: richardson
>>       Richardson: damping factor=0.9
>>     maximum iterations=1
>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>     left preconditioning
>>   PC Object:(mg_levels_1_)
>>     type: bjacobi
>>       block Jacobi: number of blocks = 4
>>       Local solve is same for all blocks, in the following KSP and PC
>> objects:
>>     KSP Object:(mg_levels_1_sub_)
>>       type: preonly
>>       maximum iterations=10000, initial guess is zero
>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>       left preconditioning
>>     PC Object:(mg_levels_1_sub_)
>>       type: icc
>>         ICC: 0 levels of fill
>>         ICC: factor fill ratio allocated 1
>>         ICC: using Manteuffel shift
>>         ICC: factor fill ratio needed 0.514899
>>              Factored matrix follows
>>             Matrix Object:
>>               type=seqsbaij, rows=2813, cols=2813
>>               total: nonzeros=48609, allocated nonzeros=48609
>>                   block size is 1
>>       linear system matrix = precond matrix:
>>       Matrix Object:
>>         type=seqaij, rows=2813, cols=2813
>>         total: nonzeros=94405, allocated nonzeros=94405
>>           not using I-node routines
>>     linear system matrix = precond matrix:
>>     Matrix Object:
>>       type=mpiaij, rows=10654, cols=10654
>>       total: nonzeros=376634, allocated nonzeros=376634
>>         not using I-node (on process 0) routines
>> Down solver (pre-smoother) on level 2 -------------------------------
>>   KSP Object:(mg_levels_2_)
>>     type: richardson
>>       Richardson: damping factor=0.9
>>     maximum iterations=1, initial guess is zero
>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>     left preconditioning
>>   PC Object:(mg_levels_2_)
>>     type: bjacobi
>>       block Jacobi: number of blocks = 4
>>       Local solve is same for all blocks, in the following KSP and PC
>> objects:
>>     KSP Object:(mg_levels_2_sub_)
>>       type: preonly
>>       maximum iterations=10000, initial guess is zero
>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>       left preconditioning
>>     PC Object:(mg_levels_2_sub_)
>>       type: icc
>>         ICC: 0 levels of fill
>>         ICC: factor fill ratio allocated 1
>>         ICC: using Manteuffel shift
>>         ICC: factor fill ratio needed 0.519045
>>              Factored matrix follows
>>             Matrix Object:
>>               type=seqsbaij, rows=101164, cols=101164
>>               total: nonzeros=1378558, allocated nonzeros=1378558
>>                   block size is 1
>>       linear system matrix = precond matrix:
>>       Matrix Object:
>>         type=seqaij, rows=101164, cols=101164
>>         total: nonzeros=2655952, allocated nonzeros=5159364
>>           not using I-node routines
>>     linear system matrix = precond matrix:
>>     Matrix Object:
>>       type=mpiaij, rows=411866, cols=411866
>>       total: nonzeros=10941434, allocated nonzeros=42010332
>>         not using I-node (on process 0) routines
>> Up solver (post-smoother) on level 2 -------------------------------
>>   KSP Object:(mg_levels_2_)
>>     type: richardson
>>       Richardson: damping factor=0.9
>>     maximum iterations=1
>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>     left preconditioning
>>   PC Object:(mg_levels_2_)
>>     type: bjacobi
>>       block Jacobi: number of blocks = 4
>>       Local solve is same for all blocks, in the following KSP and PC
>> objects:
>>     KSP Object:(mg_levels_2_sub_)
>>       type: preonly
>>       maximum iterations=10000, initial guess is zero
>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>       left preconditioning
>>     PC Object:(mg_levels_2_sub_)
>>       type: icc
>>         ICC: 0 levels of fill
>>         ICC: factor fill ratio allocated 1
>>         ICC: using Manteuffel shift
>>         ICC: factor fill ratio needed 0.519045
>>              Factored matrix follows
>>             Matrix Object:
>>               type=seqsbaij, rows=101164, cols=101164
>>               total: nonzeros=1378558, allocated nonzeros=1378558
>>                   block size is 1
>>       linear system matrix = precond matrix:
>>       Matrix Object:
>>         type=seqaij, rows=101164, cols=101164
>>         total: nonzeros=2655952, allocated nonzeros=5159364
>>           not using I-node routines
>>     linear system matrix = precond matrix:
>>     Matrix Object:
>>       type=mpiaij, rows=411866, cols=411866
>>       total: nonzeros=10941434, allocated nonzeros=42010332
>>         not using I-node (on process 0) routines
>> linear system matrix = precond matrix:
>> Matrix Object:
>>   type=mpiaij, rows=411866, cols=411866
>>   total: nonzeros=10941434, allocated nonzeros=42010332
>>     not using I-node (on process 0) routines
>>
>> %%%%%%%%%%%%%%%%%%%%%%
>> AMG w/ SOR x4 ksp_view
>> %%%%%%%%%%%%%%%%%%%%%%
>>
>> KSP Object:
>> type: cg
>> maximum iterations=10000
>> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>> left preconditioning
>> PC Object:
>> type: ml
>>   MG: type is MULTIPLICATIVE, levels=3 cycles=v, pre-smooths=1,
>> post-smooths=1
>> Coarse gride solver -- level 0 -------------------------------
>>   KSP Object:(mg_coarse_)
>>     type: preonly
>>     maximum iterations=1, initial guess is zero
>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>     left preconditioning
>>   PC Object:(mg_coarse_)
>>     type: redundant
>>       Redundant preconditioner: First (color=0) of 4 PCs follows
>>     KSP Object:(mg_coarse_redundant_)
>>       type: preonly
>>       maximum iterations=10000, initial guess is zero
>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>       left preconditioning
>>     PC Object:(mg_coarse_redundant_)
>>       type: lu
>>         LU: out-of-place factorization
>>           matrix ordering: nd
>>         LU: tolerance for zero pivot 1e-12
>>         LU: factor fill ratio needed 2.17227
>>              Factored matrix follows
>>             Matrix Object:
>>               type=seqaij, rows=283, cols=283
>>               total: nonzeros=21651, allocated nonzeros=21651
>>                 using I-node routines: found 186 nodes, limit used is
>> 5
>>       linear system matrix = precond matrix:
>>       Matrix Object:
>>         type=seqaij, rows=283, cols=283
>>         total: nonzeros=9967, allocated nonzeros=14150
>>           not using I-node routines
>>     linear system matrix = precond matrix:
>>     Matrix Object:
>>       type=mpiaij, rows=283, cols=283
>>       total: nonzeros=9967, allocated nonzeros=9967
>>         not using I-node (on process 0) routines
>> Down solver (pre-smoother) on level 1 -------------------------------
>>   KSP Object:(mg_levels_1_)
>>     type: richardson
>>       Richardson: damping factor=1
>>     maximum iterations=4, initial guess is zero
>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>     left preconditioning
>>   PC Object:(mg_levels_1_)
>>     type: sor
>>       SOR: type = local_symmetric, iterations = 1, omega = 1
>>     linear system matrix = precond matrix:
>>     Matrix Object:
>>       type=mpiaij, rows=10654, cols=10654
>>       total: nonzeros=376634, allocated nonzeros=376634
>>         not using I-node (on process 0) routines
>> Up solver (post-smoother) on level 1 -------------------------------
>>   KSP Object:(mg_levels_1_)
>>     type: richardson
>>       Richardson: damping factor=1
>>     maximum iterations=4
>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>     left preconditioning
>>   PC Object:(mg_levels_1_)
>>     type: sor
>>       SOR: type = local_symmetric, iterations = 1, omega = 1
>>     linear system matrix = precond matrix:
>>     Matrix Object:
>>       type=mpiaij, rows=10654, cols=10654
>>       total: nonzeros=376634, allocated nonzeros=376634
>>         not using I-node (on process 0) routines
>> Down solver (pre-smoother) on level 2 -------------------------------
>>   KSP Object:(mg_levels_2_)
>>     type: richardson
>>       Richardson: damping factor=1
>>     maximum iterations=4, initial guess is zero
>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>     left preconditioning
>>   PC Object:(mg_levels_2_)
>>     type: sor
>>       SOR: type = local_symmetric, iterations = 1, omega = 1
>>     linear system matrix = precond matrix:
>>     Matrix Object:
>>       type=mpiaij, rows=411866, cols=411866
>>       total: nonzeros=10941434, allocated nonzeros=42010332
>>         not using I-node (on process 0) routines
>> Up solver (post-smoother) on level 2 -------------------------------
>>   KSP Object:(mg_levels_2_)
>>     type: richardson
>>       Richardson: damping factor=1
>>     maximum iterations=4
>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>     left preconditioning
>>   PC Object:(mg_levels_2_)
>>     type: sor
>>       SOR: type = local_symmetric, iterations = 1, omega = 1
>>     linear system matrix = precond matrix:
>>     Matrix Object:
>>       type=mpiaij, rows=411866, cols=411866
>>       total: nonzeros=10941434, allocated nonzeros=42010332
>>         not using I-node (on process 0) routines
>> linear system matrix = precond matrix:
>> Matrix Object:
>>   type=mpiaij, rows=411866, cols=411866
>>   total: nonzeros=10941434, allocated nonzeros=42010332
>>     not using I-node (on process 0) routines
>>
>>
>