[petsc-dev] [petsc-users] Problem with AMG packages

Tue Oct 8 22:54:02 CDT 2013

"Mark F. Adams" <mfadams at lbl.gov> writes:

> On Oct 8, 2013, at 8:18 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>> 
>> MatView                6 1.0 3.4042e+01269.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 18  0  0  0  0  25  0  0  0  0     0
>> 
>>   Something is seriously wrong with the default matview (or pcview) for PC GAMG?  It is printing way to much for the default view and thus totally hosing the timings.  The default PCView() is suppose to be very light weight (not do excessive communication) and provide very high level information.
>> 
>
> Oh, I think the problem is that GAMG sets the coarse grid solver explicitly as a block jacobi with LU local.  GAMG insures that all equation are on one PE for the coarsest grid.  ML uses redundant.  You should be able to use redundant in GAMG, it is just not the default.  This is not tested.  So I'm guessing the problem is that block Jacobi is noisy.

Yes, this was fixed in the following commit, which has now been merged
to 'maint', but was not when v3.4.2 was tagged.  So it'll be in v3.4.3
and Pierre can get it by tracking 'maint'.  I wasn't following the
earlier discussion and this is probably besides the point.

commit 5b42dca8ccf395618a3b4b6d09926ce0fcf677cd
Author: Jed Brown <jedbrown at mcs.anl.gov>
Date:   Wed May 29 16:44:48 2013 -0500

    PCGAMG: set bjacobi->same_local_solves=TRUE to clean coarse grid viewing

    When bjacobi->same_local_solves=FALSE, PCView_BJacobi views each
    subdomain solver separately.  (That output is currently jumbled because
    singleton viewers does not produce synchronized output, but O(P) data in
    a -ksp_view is still not acceptable.)  GAMG configures all local solves
    identically (though all but one process will have zero entries) but in
    doing so, it trips bjacobi->same_local_solves=FALSE.  This commit is a
    temporary fix until PCView_BJacobi is fixed to recognize this situation
    in a more general setting.

 src/ksp/pc/impls/gamg/gamg.c | 5 +++++
 1 file changed, 5 insertions(+)

>> 
>> 
>> On Oct 8, 2013, at 6:50 PM, "Mark F. Adams" <mfadams at lbl.gov> wrote:
>> 
>>> Something is going terrible wrong with the setup in hypre and ML.  hypre's default parameters are not setup well for 3D.  I use:
>>> 
>>> -pc_hypre_boomeramg_no_CF
>>> -pc_hypre_boomeramg_agg_nl 1
>>> -pc_hypre_boomeramg_coarsen_type HMIS
>>> -pc_hypre_boomeramg_interp_type ext+i
>>> 
>>> I'm not sure what is going wrong with ML's setup.
>>> 
>>> GAMG is converging terribly.  Is this just a simple 7-point Laplacian?  It looks like you the eigen estimate is low on the finest grid, which messes up the smoother.  Try running with these parameters and send the output:
>>> 
>>> -pc_gamg_agg_nsmooths 1 
>>> -pc_gamg_verbose 2
>>> -mg_levels_ksp_type richardson
>>> -mg_levels_pc_type sor
>>> 
>>> Mark
>>> 
>>> On Oct 8, 2013, at 5:46 PM, Pierre Jolivet <jolivet at ann.jussieu.fr> wrote:
>>> 
>>>> Please find the log for BoomerAMG, ML and GAMG attached. The set up for
>>>> GAMG doesn't look so bad compared to the other packages, so I'm wondering
>>>> what is going on with those ?
>>>> 
>>>>> 
>>>>> We need the output from running with -log_summary -pc_mg_log
>>>>> 
>>>>> Also you can run with PETSc's AMG called GAMG (run with -pc_type gamg)
>>>>> This will give the most useful information about where it is spending
>>>>> the time.
>>>>> 
>>>>> 
>>>>> Barry
>>>>> 
>>>>> 
>>>>> On Oct 8, 2013, at 4:11 PM, Pierre Jolivet <jolivet at ann.jussieu.fr> wrote:
>>>>> 
>>>>>> Dear all,
>>>>>> I'm trying to compare linear solvers for a simple Poisson equation in
>>>>>> 3D.
>>>>>> I thought that MG was the way to go, but looking at my log, the
>>>>>> performance looks abysmal (I know that the matrices are way too small
>>>>>> but
>>>>>> if I go bigger, it just never performs a single iteration ..). Even
>>>>>> though
>>>>>> this is neither the BoomerAMG nor the ML mailing list, could you please
>>>>>> tell me if PETSc sets some default flags that make the setup for those
>>>>>> solvers so slow for this simple problem ? The performance of (G)ASM is
>>>>>> in
>>>>>> comparison much better.
>>>>>> 
>>>>>> Thanks in advance for your help.
>>>>>> 
>>>>>> PS: first the BoomerAMG log, then ML (much more verbose, sorry).
>>>>>> 
>>>>>> 0 KSP Residual norm 1.599647112604e+00
>>>>>> 1 KSP Residual norm 5.450838232404e-02
>>>>>> 2 KSP Residual norm 3.549673478318e-03
>>>>>> 3 KSP Residual norm 2.901826808841e-04
>>>>>> 4 KSP Residual norm 2.574235778729e-05
>>>>>> 5 KSP Residual norm 2.253410171682e-06
>>>>>> 6 KSP Residual norm 1.871067784877e-07
>>>>>> 7 KSP Residual norm 1.681162800670e-08
>>>>>> 8 KSP Residual norm 2.120841512414e-09
>>>>>> KSP Object: 2048 MPI processes
>>>>>> type: gmres
>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>>>>> Orthogonalization with no iterative refinement
>>>>>> GMRES: happy breakdown tolerance 1e-30
>>>>>> maximum iterations=200, initial guess is zero
>>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>>>>> left preconditioning
>>>>>> using PRECONDITIONED norm type for convergence test
>>>>>> PC Object: 2048 MPI processes
>>>>>> type: hypre
>>>>>> HYPRE BoomerAMG preconditioning
>>>>>> HYPRE BoomerAMG: Cycle type V
>>>>>> HYPRE BoomerAMG: Maximum number of levels 25
>>>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>>> HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>>> HYPRE BoomerAMG: Maximum row sums 0.9
>>>>>> HYPRE BoomerAMG: Sweeps down         1
>>>>>> HYPRE BoomerAMG: Sweeps up           1
>>>>>> HYPRE BoomerAMG: Sweeps on coarse    1
>>>>>> HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>>>>> HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>>>>> HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>>>>> HYPRE BoomerAMG: Relax weight  (all)      1
>>>>>> HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>>> HYPRE BoomerAMG: Using CF-relaxation
>>>>>> HYPRE BoomerAMG: Measure type        local
>>>>>> HYPRE BoomerAMG: Coarsen type        Falgout
>>>>>> HYPRE BoomerAMG: Interpolation type  classical
>>>>>> linear system matrix = precond matrix:
>>>>>> Matrix Object:   2048 MPI processes
>>>>>> type: mpiaij
>>>>>> rows=4173281, cols=4173281
>>>>>> total: nonzeros=102576661, allocated nonzeros=102576661
>>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>>   not using I-node (on process 0) routines
>>>>>> --- system solved with PETSc (in 1.005199e+02 seconds)
>>>>>> 
>>>>>> 0 KSP Residual norm 2.368804472986e-01
>>>>>> 1 KSP Residual norm 5.676430019132e-02
>>>>>> 2 KSP Residual norm 1.898005876002e-02
>>>>>> 3 KSP Residual norm 6.193922902926e-03
>>>>>> 4 KSP Residual norm 2.008448794493e-03
>>>>>> 5 KSP Residual norm 6.390465670228e-04
>>>>>> 6 KSP Residual norm 2.157709394389e-04
>>>>>> 7 KSP Residual norm 7.295973819979e-05
>>>>>> 8 KSP Residual norm 2.358343271482e-05
>>>>>> 9 KSP Residual norm 7.489696222066e-06
>>>>>> 10 KSP Residual norm 2.390946857593e-06
>>>>>> 11 KSP Residual norm 8.068086385140e-07
>>>>>> 12 KSP Residual norm 2.706607789749e-07
>>>>>> 13 KSP Residual norm 8.636910863376e-08
>>>>>> 14 KSP Residual norm 2.761981175852e-08
>>>>>> 15 KSP Residual norm 8.755459874369e-09
>>>>>> 16 KSP Residual norm 2.708848598341e-09
>>>>>> 17 KSP Residual norm 8.968748876265e-10
>>>>>> KSP Object: 2048 MPI processes
>>>>>> type: gmres
>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>>>>> Orthogonalization with no iterative refinement
>>>>>> GMRES: happy breakdown tolerance 1e-30
>>>>>> maximum iterations=200, initial guess is zero
>>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>>>>> left preconditioning
>>>>>> using PRECONDITIONED norm type for convergence test
>>>>>> PC Object: 2048 MPI processes
>>>>>> type: ml
>>>>>> MG: type is MULTIPLICATIVE, levels=3 cycles=v
>>>>>>   Cycles per PCApply=1
>>>>>>   Using Galerkin computed coarse grid matrices
>>>>>> Coarse grid solver -- level -------------------------------
>>>>>> KSP Object:    (mg_coarse_)     2048 MPI processes
>>>>>>   type: preonly
>>>>>>   maximum iterations=1, initial guess is zero
>>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>>   left preconditioning
>>>>>>   using NONE norm type for convergence test
>>>>>> PC Object:    (mg_coarse_)     2048 MPI processes
>>>>>>   type: redundant
>>>>>>     Redundant preconditioner: First (color=0) of 2048 PCs follows
>>>>>>   KSP Object:      (mg_coarse_redundant_)       1 MPI processes
>>>>>>     type: preonly
>>>>>>     maximum iterations=10000, initial guess is zero
>>>>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>>     left preconditioning
>>>>>>     using NONE norm type for convergence test
>>>>>>   PC Object:      (mg_coarse_redundant_)       1 MPI processes
>>>>>>     type: lu
>>>>>>       LU: out-of-place factorization
>>>>>>       tolerance for zero pivot 2.22045e-14
>>>>>>       using diagonal shift on blocks to prevent zero pivot
>>>>>>       matrix ordering: nd
>>>>>>       factor fill ratio given 5, needed 4.38504
>>>>>>         Factored matrix follows:
>>>>>>           Matrix Object:               1 MPI processes
>>>>>>             type: seqaij
>>>>>>             rows=2055, cols=2055
>>>>>>             package used to perform factorization: petsc
>>>>>>             total: nonzeros=2476747, allocated nonzeros=2476747
>>>>>>             total number of mallocs used during MatSetValues calls =0
>>>>>>               using I-node routines: found 1638 nodes, limit used is
>>>>>> 5
>>>>>>     linear system matrix = precond matrix:
>>>>>>     Matrix Object:         1 MPI processes
>>>>>>       type: seqaij
>>>>>>       rows=2055, cols=2055
>>>>>>       total: nonzeros=564817, allocated nonzeros=1093260
>>>>>>       total number of mallocs used during MatSetValues calls =0
>>>>>>         not using I-node routines
>>>>>>   linear system matrix = precond matrix:
>>>>>>   Matrix Object:       2048 MPI processes
>>>>>>     type: mpiaij
>>>>>>     rows=2055, cols=2055
>>>>>>     total: nonzeros=564817, allocated nonzeros=564817
>>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>>       not using I-node (on process 0) routines
>>>>>> Down solver (pre-smoother) on level 1 -------------------------------
>>>>>> KSP Object:    (mg_levels_1_)     2048 MPI processes
>>>>>>   type: richardson
>>>>>>     Richardson: damping factor=1
>>>>>>   maximum iterations=2
>>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>>   left preconditioning
>>>>>>   using nonzero initial guess
>>>>>>   using NONE norm type for convergence test
>>>>>> PC Object:    (mg_levels_1_)     2048 MPI processes
>>>>>>   type: sor
>>>>>>     SOR: type = local_symmetric, iterations = 1, local iterations =
>>>>>> 1,
>>>>>> omega = 1
>>>>>>   linear system matrix = precond matrix:
>>>>>>   Matrix Object:       2048 MPI processes
>>>>>>     type: mpiaij
>>>>>>     rows=30194, cols=30194
>>>>>>     total: nonzeros=3368414, allocated nonzeros=3368414
>>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>>       not using I-node (on process 0) routines
>>>>>> Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>> Down solver (pre-smoother) on level 2 -------------------------------
>>>>>> KSP Object:    (mg_levels_2_)     2048 MPI processes
>>>>>>   type: richardson
>>>>>>     Richardson: damping factor=1
>>>>>>   maximum iterations=2
>>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>>   left preconditioning
>>>>>>   using nonzero initial guess
>>>>>>   using NONE norm type for convergence test
>>>>>> PC Object:    (mg_levels_2_)     2048 MPI processes
>>>>>>   type: sor
>>>>>>     SOR: type = local_symmetric, iterations = 1, local iterations =
>>>>>> 1,
>>>>>> omega = 1
>>>>>>   linear system matrix = precond matrix:
>>>>>>   Matrix Object:       2048 MPI processes
>>>>>>     type: mpiaij
>>>>>>     rows=531441, cols=531441
>>>>>>     total: nonzeros=12476324, allocated nonzeros=12476324
>>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>>       not using I-node (on process 0) routines
>>>>>> Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>> linear system matrix = precond matrix:
>>>>>> Matrix Object:   2048 MPI processes
>>>>>> type: mpiaij
>>>>>> rows=531441, cols=531441
>>>>>> total: nonzeros=12476324, allocated nonzeros=12476324
>>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>>   not using I-node (on process 0) routines
>>>>>> --- system solved with PETSc (in 2.407844e+02 seconds)
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> <log-GAMG><log-ML><log-BoomerAMG>
>>> 
>> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20131008/69c5ae8b/attachment.sig>