[petsc-users] Problem with AMG packages

Tue Oct 8 18:50:13 CDT 2013

Something is going terrible wrong with the setup in hypre and ML.  hypre's default parameters are not setup well for 3D.  I use:

-pc_hypre_boomeramg_no_CF
-pc_hypre_boomeramg_agg_nl 1
-pc_hypre_boomeramg_coarsen_type HMIS
-pc_hypre_boomeramg_interp_type ext+i

I'm not sure what is going wrong with ML's setup.

GAMG is converging terribly.  Is this just a simple 7-point Laplacian?  It looks like you the eigen estimate is low on the finest grid, which messes up the smoother.  Try running with these parameters and send the output:

-pc_gamg_agg_nsmooths 1 
-pc_gamg_verbose 2
-mg_levels_ksp_type richardson
-mg_levels_pc_type sor

Mark

On Oct 8, 2013, at 5:46 PM, Pierre Jolivet <jolivet at ann.jussieu.fr> wrote:

> Please find the log for BoomerAMG, ML and GAMG attached. The set up for
> GAMG doesn't look so bad compared to the other packages, so I'm wondering
> what is going on with those ?
> 
>> 
>>  We need the output from running with -log_summary -pc_mg_log
>> 
>>   Also you can run with PETSc's AMG called GAMG (run with -pc_type gamg)
>> This will give the most useful information about where it is spending
>> the time.
>> 
>> 
>>   Barry
>> 
>> 
>> On Oct 8, 2013, at 4:11 PM, Pierre Jolivet <jolivet at ann.jussieu.fr> wrote:
>> 
>>> Dear all,
>>> I'm trying to compare linear solvers for a simple Poisson equation in
>>> 3D.
>>> I thought that MG was the way to go, but looking at my log, the
>>> performance looks abysmal (I know that the matrices are way too small
>>> but
>>> if I go bigger, it just never performs a single iteration ..). Even
>>> though
>>> this is neither the BoomerAMG nor the ML mailing list, could you please
>>> tell me if PETSc sets some default flags that make the setup for those
>>> solvers so slow for this simple problem ? The performance of (G)ASM is
>>> in
>>> comparison much better.
>>> 
>>> Thanks in advance for your help.
>>> 
>>> PS: first the BoomerAMG log, then ML (much more verbose, sorry).
>>> 
>>> 0 KSP Residual norm 1.599647112604e+00
>>> 1 KSP Residual norm 5.450838232404e-02
>>> 2 KSP Residual norm 3.549673478318e-03
>>> 3 KSP Residual norm 2.901826808841e-04
>>> 4 KSP Residual norm 2.574235778729e-05
>>> 5 KSP Residual norm 2.253410171682e-06
>>> 6 KSP Residual norm 1.871067784877e-07
>>> 7 KSP Residual norm 1.681162800670e-08
>>> 8 KSP Residual norm 2.120841512414e-09
>>> KSP Object: 2048 MPI processes
>>> type: gmres
>>>   GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>> Orthogonalization with no iterative refinement
>>>   GMRES: happy breakdown tolerance 1e-30
>>> maximum iterations=200, initial guess is zero
>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>> left preconditioning
>>> using PRECONDITIONED norm type for convergence test
>>> PC Object: 2048 MPI processes
>>> type: hypre
>>>   HYPRE BoomerAMG preconditioning
>>>   HYPRE BoomerAMG: Cycle type V
>>>   HYPRE BoomerAMG: Maximum number of levels 25
>>>   HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>   HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>   HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>   HYPRE BoomerAMG: Interpolation truncation factor 0
>>>   HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>   HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>   HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>   HYPRE BoomerAMG: Maximum row sums 0.9
>>>   HYPRE BoomerAMG: Sweeps down         1
>>>   HYPRE BoomerAMG: Sweeps up           1
>>>   HYPRE BoomerAMG: Sweeps on coarse    1
>>>   HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>>   HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>>   HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>>   HYPRE BoomerAMG: Relax weight  (all)      1
>>>   HYPRE BoomerAMG: Outer relax weight (all) 1
>>>   HYPRE BoomerAMG: Using CF-relaxation
>>>   HYPRE BoomerAMG: Measure type        local
>>>   HYPRE BoomerAMG: Coarsen type        Falgout
>>>   HYPRE BoomerAMG: Interpolation type  classical
>>> linear system matrix = precond matrix:
>>> Matrix Object:   2048 MPI processes
>>>   type: mpiaij
>>>   rows=4173281, cols=4173281
>>>   total: nonzeros=102576661, allocated nonzeros=102576661
>>>   total number of mallocs used during MatSetValues calls =0
>>>     not using I-node (on process 0) routines
>>> --- system solved with PETSc (in 1.005199e+02 seconds)
>>> 
>>> 0 KSP Residual norm 2.368804472986e-01
>>> 1 KSP Residual norm 5.676430019132e-02
>>> 2 KSP Residual norm 1.898005876002e-02
>>> 3 KSP Residual norm 6.193922902926e-03
>>> 4 KSP Residual norm 2.008448794493e-03
>>> 5 KSP Residual norm 6.390465670228e-04
>>> 6 KSP Residual norm 2.157709394389e-04
>>> 7 KSP Residual norm 7.295973819979e-05
>>> 8 KSP Residual norm 2.358343271482e-05
>>> 9 KSP Residual norm 7.489696222066e-06
>>> 10 KSP Residual norm 2.390946857593e-06
>>> 11 KSP Residual norm 8.068086385140e-07
>>> 12 KSP Residual norm 2.706607789749e-07
>>> 13 KSP Residual norm 8.636910863376e-08
>>> 14 KSP Residual norm 2.761981175852e-08
>>> 15 KSP Residual norm 8.755459874369e-09
>>> 16 KSP Residual norm 2.708848598341e-09
>>> 17 KSP Residual norm 8.968748876265e-10
>>> KSP Object: 2048 MPI processes
>>> type: gmres
>>>   GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>> Orthogonalization with no iterative refinement
>>>   GMRES: happy breakdown tolerance 1e-30
>>> maximum iterations=200, initial guess is zero
>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>> left preconditioning
>>> using PRECONDITIONED norm type for convergence test
>>> PC Object: 2048 MPI processes
>>> type: ml
>>>   MG: type is MULTIPLICATIVE, levels=3 cycles=v
>>>     Cycles per PCApply=1
>>>     Using Galerkin computed coarse grid matrices
>>> Coarse grid solver -- level -------------------------------
>>>   KSP Object:    (mg_coarse_)     2048 MPI processes
>>>     type: preonly
>>>     maximum iterations=1, initial guess is zero
>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>     left preconditioning
>>>     using NONE norm type for convergence test
>>>   PC Object:    (mg_coarse_)     2048 MPI processes
>>>     type: redundant
>>>       Redundant preconditioner: First (color=0) of 2048 PCs follows
>>>     KSP Object:      (mg_coarse_redundant_)       1 MPI processes
>>>       type: preonly
>>>       maximum iterations=10000, initial guess is zero
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>       left preconditioning
>>>       using NONE norm type for convergence test
>>>     PC Object:      (mg_coarse_redundant_)       1 MPI processes
>>>       type: lu
>>>         LU: out-of-place factorization
>>>         tolerance for zero pivot 2.22045e-14
>>>         using diagonal shift on blocks to prevent zero pivot
>>>         matrix ordering: nd
>>>         factor fill ratio given 5, needed 4.38504
>>>           Factored matrix follows:
>>>             Matrix Object:               1 MPI processes
>>>               type: seqaij
>>>               rows=2055, cols=2055
>>>               package used to perform factorization: petsc
>>>               total: nonzeros=2476747, allocated nonzeros=2476747
>>>               total number of mallocs used during MatSetValues calls =0
>>>                 using I-node routines: found 1638 nodes, limit used is
>>> 5
>>>       linear system matrix = precond matrix:
>>>       Matrix Object:         1 MPI processes
>>>         type: seqaij
>>>         rows=2055, cols=2055
>>>         total: nonzeros=564817, allocated nonzeros=1093260
>>>         total number of mallocs used during MatSetValues calls =0
>>>           not using I-node routines
>>>     linear system matrix = precond matrix:
>>>     Matrix Object:       2048 MPI processes
>>>       type: mpiaij
>>>       rows=2055, cols=2055
>>>       total: nonzeros=564817, allocated nonzeros=564817
>>>       total number of mallocs used during MatSetValues calls =0
>>>         not using I-node (on process 0) routines
>>> Down solver (pre-smoother) on level 1 -------------------------------
>>>   KSP Object:    (mg_levels_1_)     2048 MPI processes
>>>     type: richardson
>>>       Richardson: damping factor=1
>>>     maximum iterations=2
>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>     left preconditioning
>>>     using nonzero initial guess
>>>     using NONE norm type for convergence test
>>>   PC Object:    (mg_levels_1_)     2048 MPI processes
>>>     type: sor
>>>       SOR: type = local_symmetric, iterations = 1, local iterations =
>>> 1,
>>> omega = 1
>>>     linear system matrix = precond matrix:
>>>     Matrix Object:       2048 MPI processes
>>>       type: mpiaij
>>>       rows=30194, cols=30194
>>>       total: nonzeros=3368414, allocated nonzeros=3368414
>>>       total number of mallocs used during MatSetValues calls =0
>>>         not using I-node (on process 0) routines
>>> Up solver (post-smoother) same as down solver (pre-smoother)
>>> Down solver (pre-smoother) on level 2 -------------------------------
>>>   KSP Object:    (mg_levels_2_)     2048 MPI processes
>>>     type: richardson
>>>       Richardson: damping factor=1
>>>     maximum iterations=2
>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>     left preconditioning
>>>     using nonzero initial guess
>>>     using NONE norm type for convergence test
>>>   PC Object:    (mg_levels_2_)     2048 MPI processes
>>>     type: sor
>>>       SOR: type = local_symmetric, iterations = 1, local iterations =
>>> 1,
>>> omega = 1
>>>     linear system matrix = precond matrix:
>>>     Matrix Object:       2048 MPI processes
>>>       type: mpiaij
>>>       rows=531441, cols=531441
>>>       total: nonzeros=12476324, allocated nonzeros=12476324
>>>       total number of mallocs used during MatSetValues calls =0
>>>         not using I-node (on process 0) routines
>>> Up solver (post-smoother) same as down solver (pre-smoother)
>>> linear system matrix = precond matrix:
>>> Matrix Object:   2048 MPI processes
>>>   type: mpiaij
>>>   rows=531441, cols=531441
>>>   total: nonzeros=12476324, allocated nonzeros=12476324
>>>   total number of mallocs used during MatSetValues calls =0
>>>     not using I-node (on process 0) routines
>>> --- system solved with PETSc (in 2.407844e+02 seconds)
>>> 
>>> 
>> 
>> 
> <log-GAMG><log-ML><log-BoomerAMG>