[petsc-dev] [petsc-users] Problem with AMG packages

Tue Oct 8 21:56:31 CDT 2013

On Oct 8, 2013, at 8:18 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

> 
> MatView                6 1.0 3.4042e+01269.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 18  0  0  0  0  25  0  0  0  0     0
> 
>   Something is seriously wrong with the default matview (or pcview) for PC GAMG?  

GAMG does not set a pcview.  I never touch view functions as far as I can tell.

> It is printing way to much for the default view and thus totally hosing the timings.  

It might be catching load imbalance from someplace upstream (look at the ratio).

> The default PCView() is suppose to be very light weight (not do excessive communication) and provide very high level information.
> 
> 
>   Barry
> 
> 
> On Oct 8, 2013, at 6:50 PM, "Mark F. Adams" <mfadams at lbl.gov> wrote:
> 
>> Something is going terrible wrong with the setup in hypre and ML.  hypre's default parameters are not setup well for 3D.  I use:
>> 
>> -pc_hypre_boomeramg_no_CF
>> -pc_hypre_boomeramg_agg_nl 1
>> -pc_hypre_boomeramg_coarsen_type HMIS
>> -pc_hypre_boomeramg_interp_type ext+i
>> 
>> I'm not sure what is going wrong with ML's setup.
>> 
>> GAMG is converging terribly.  Is this just a simple 7-point Laplacian?  It looks like you the eigen estimate is low on the finest grid, which messes up the smoother.  Try running with these parameters and send the output:
>> 
>> -pc_gamg_agg_nsmooths 1 
>> -pc_gamg_verbose 2
>> -mg_levels_ksp_type richardson
>> -mg_levels_pc_type sor
>> 
>> Mark
>> 
>> On Oct 8, 2013, at 5:46 PM, Pierre Jolivet <jolivet at ann.jussieu.fr> wrote:
>> 
>>> Please find the log for BoomerAMG, ML and GAMG attached. The set up for
>>> GAMG doesn't look so bad compared to the other packages, so I'm wondering
>>> what is going on with those ?
>>> 
>>>> 
>>>> We need the output from running with -log_summary -pc_mg_log
>>>> 
>>>> Also you can run with PETSc's AMG called GAMG (run with -pc_type gamg)
>>>> This will give the most useful information about where it is spending
>>>> the time.
>>>> 
>>>> 
>>>> Barry
>>>> 
>>>> 
>>>> On Oct 8, 2013, at 4:11 PM, Pierre Jolivet <jolivet at ann.jussieu.fr> wrote:
>>>> 
>>>>> Dear all,
>>>>> I'm trying to compare linear solvers for a simple Poisson equation in
>>>>> 3D.
>>>>> I thought that MG was the way to go, but looking at my log, the
>>>>> performance looks abysmal (I know that the matrices are way too small
>>>>> but
>>>>> if I go bigger, it just never performs a single iteration ..). Even
>>>>> though
>>>>> this is neither the BoomerAMG nor the ML mailing list, could you please
>>>>> tell me if PETSc sets some default flags that make the setup for those
>>>>> solvers so slow for this simple problem ? The performance of (G)ASM is
>>>>> in
>>>>> comparison much better.
>>>>> 
>>>>> Thanks in advance for your help.
>>>>> 
>>>>> PS: first the BoomerAMG log, then ML (much more verbose, sorry).
>>>>> 
>>>>> 0 KSP Residual norm 1.599647112604e+00
>>>>> 1 KSP Residual norm 5.450838232404e-02
>>>>> 2 KSP Residual norm 3.549673478318e-03
>>>>> 3 KSP Residual norm 2.901826808841e-04
>>>>> 4 KSP Residual norm 2.574235778729e-05
>>>>> 5 KSP Residual norm 2.253410171682e-06
>>>>> 6 KSP Residual norm 1.871067784877e-07
>>>>> 7 KSP Residual norm 1.681162800670e-08
>>>>> 8 KSP Residual norm 2.120841512414e-09
>>>>> KSP Object: 2048 MPI processes
>>>>> type: gmres
>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>>>> Orthogonalization with no iterative refinement
>>>>> GMRES: happy breakdown tolerance 1e-30
>>>>> maximum iterations=200, initial guess is zero
>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>>>> left preconditioning
>>>>> using PRECONDITIONED norm type for convergence test
>>>>> PC Object: 2048 MPI processes
>>>>> type: hypre
>>>>> HYPRE BoomerAMG preconditioning
>>>>> HYPRE BoomerAMG: Cycle type V
>>>>> HYPRE BoomerAMG: Maximum number of levels 25
>>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>> HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>> HYPRE BoomerAMG: Maximum row sums 0.9
>>>>> HYPRE BoomerAMG: Sweeps down         1
>>>>> HYPRE BoomerAMG: Sweeps up           1
>>>>> HYPRE BoomerAMG: Sweeps on coarse    1
>>>>> HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>>>> HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>>>> HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>>>> HYPRE BoomerAMG: Relax weight  (all)      1
>>>>> HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>> HYPRE BoomerAMG: Using CF-relaxation
>>>>> HYPRE BoomerAMG: Measure type        local
>>>>> HYPRE BoomerAMG: Coarsen type        Falgout
>>>>> HYPRE BoomerAMG: Interpolation type  classical
>>>>> linear system matrix = precond matrix:
>>>>> Matrix Object:   2048 MPI processes
>>>>> type: mpiaij
>>>>> rows=4173281, cols=4173281
>>>>> total: nonzeros=102576661, allocated nonzeros=102576661
>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>   not using I-node (on process 0) routines
>>>>> --- system solved with PETSc (in 1.005199e+02 seconds)
>>>>> 
>>>>> 0 KSP Residual norm 2.368804472986e-01
>>>>> 1 KSP Residual norm 5.676430019132e-02
>>>>> 2 KSP Residual norm 1.898005876002e-02
>>>>> 3 KSP Residual norm 6.193922902926e-03
>>>>> 4 KSP Residual norm 2.008448794493e-03
>>>>> 5 KSP Residual norm 6.390465670228e-04
>>>>> 6 KSP Residual norm 2.157709394389e-04
>>>>> 7 KSP Residual norm 7.295973819979e-05
>>>>> 8 KSP Residual norm 2.358343271482e-05
>>>>> 9 KSP Residual norm 7.489696222066e-06
>>>>> 10 KSP Residual norm 2.390946857593e-06
>>>>> 11 KSP Residual norm 8.068086385140e-07
>>>>> 12 KSP Residual norm 2.706607789749e-07
>>>>> 13 KSP Residual norm 8.636910863376e-08
>>>>> 14 KSP Residual norm 2.761981175852e-08
>>>>> 15 KSP Residual norm 8.755459874369e-09
>>>>> 16 KSP Residual norm 2.708848598341e-09
>>>>> 17 KSP Residual norm 8.968748876265e-10
>>>>> KSP Object: 2048 MPI processes
>>>>> type: gmres
>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>>>> Orthogonalization with no iterative refinement
>>>>> GMRES: happy breakdown tolerance 1e-30
>>>>> maximum iterations=200, initial guess is zero
>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>>>> left preconditioning
>>>>> using PRECONDITIONED norm type for convergence test
>>>>> PC Object: 2048 MPI processes
>>>>> type: ml
>>>>> MG: type is MULTIPLICATIVE, levels=3 cycles=v
>>>>>   Cycles per PCApply=1
>>>>>   Using Galerkin computed coarse grid matrices
>>>>> Coarse grid solver -- level -------------------------------
>>>>> KSP Object:    (mg_coarse_)     2048 MPI processes
>>>>>   type: preonly
>>>>>   maximum iterations=1, initial guess is zero
>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>   left preconditioning
>>>>>   using NONE norm type for convergence test
>>>>> PC Object:    (mg_coarse_)     2048 MPI processes
>>>>>   type: redundant
>>>>>     Redundant preconditioner: First (color=0) of 2048 PCs follows
>>>>>   KSP Object:      (mg_coarse_redundant_)       1 MPI processes
>>>>>     type: preonly
>>>>>     maximum iterations=10000, initial guess is zero
>>>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>     left preconditioning
>>>>>     using NONE norm type for convergence test
>>>>>   PC Object:      (mg_coarse_redundant_)       1 MPI processes
>>>>>     type: lu
>>>>>       LU: out-of-place factorization
>>>>>       tolerance for zero pivot 2.22045e-14
>>>>>       using diagonal shift on blocks to prevent zero pivot
>>>>>       matrix ordering: nd
>>>>>       factor fill ratio given 5, needed 4.38504
>>>>>         Factored matrix follows:
>>>>>           Matrix Object:               1 MPI processes
>>>>>             type: seqaij
>>>>>             rows=2055, cols=2055
>>>>>             package used to perform factorization: petsc
>>>>>             total: nonzeros=2476747, allocated nonzeros=2476747
>>>>>             total number of mallocs used during MatSetValues calls =0
>>>>>               using I-node routines: found 1638 nodes, limit used is
>>>>> 5
>>>>>     linear system matrix = precond matrix:
>>>>>     Matrix Object:         1 MPI processes
>>>>>       type: seqaij
>>>>>       rows=2055, cols=2055
>>>>>       total: nonzeros=564817, allocated nonzeros=1093260
>>>>>       total number of mallocs used during MatSetValues calls =0
>>>>>         not using I-node routines
>>>>>   linear system matrix = precond matrix:
>>>>>   Matrix Object:       2048 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=2055, cols=2055
>>>>>     total: nonzeros=564817, allocated nonzeros=564817
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       not using I-node (on process 0) routines
>>>>> Down solver (pre-smoother) on level 1 -------------------------------
>>>>> KSP Object:    (mg_levels_1_)     2048 MPI processes
>>>>>   type: richardson
>>>>>     Richardson: damping factor=1
>>>>>   maximum iterations=2
>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>   left preconditioning
>>>>>   using nonzero initial guess
>>>>>   using NONE norm type for convergence test
>>>>> PC Object:    (mg_levels_1_)     2048 MPI processes
>>>>>   type: sor
>>>>>     SOR: type = local_symmetric, iterations = 1, local iterations =
>>>>> 1,
>>>>> omega = 1
>>>>>   linear system matrix = precond matrix:
>>>>>   Matrix Object:       2048 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=30194, cols=30194
>>>>>     total: nonzeros=3368414, allocated nonzeros=3368414
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       not using I-node (on process 0) routines
>>>>> Up solver (post-smoother) same as down solver (pre-smoother)
>>>>> Down solver (pre-smoother) on level 2 -------------------------------
>>>>> KSP Object:    (mg_levels_2_)     2048 MPI processes
>>>>>   type: richardson
>>>>>     Richardson: damping factor=1
>>>>>   maximum iterations=2
>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>   left preconditioning
>>>>>   using nonzero initial guess
>>>>>   using NONE norm type for convergence test
>>>>> PC Object:    (mg_levels_2_)     2048 MPI processes
>>>>>   type: sor
>>>>>     SOR: type = local_symmetric, iterations = 1, local iterations =
>>>>> 1,
>>>>> omega = 1
>>>>>   linear system matrix = precond matrix:
>>>>>   Matrix Object:       2048 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=531441, cols=531441
>>>>>     total: nonzeros=12476324, allocated nonzeros=12476324
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       not using I-node (on process 0) routines
>>>>> Up solver (post-smoother) same as down solver (pre-smoother)
>>>>> linear system matrix = precond matrix:
>>>>> Matrix Object:   2048 MPI processes
>>>>> type: mpiaij
>>>>> rows=531441, cols=531441
>>>>> total: nonzeros=12476324, allocated nonzeros=12476324
>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>   not using I-node (on process 0) routines
>>>>> --- system solved with PETSc (in 2.407844e+02 seconds)
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> <log-GAMG><log-ML><log-BoomerAMG>
>> 
>