[petsc-dev] [petsc-users] Problem with AMG packages

Wed Oct 9 10:27:05 CDT 2013

It looks like you have 531441 equations and 2K processors.  THis is pretty small.

You have ~23 non-zeros per row.  What kind of discretization is this?

On Oct 9, 2013, at 9:34 AM, Pierre Jolivet <jolivet at ann.jussieu.fr> wrote:

> Mark and Barry,
> You will find attached the log for BoomerAMG (better, but still slow
> imho), ML (still lost), GAMG (better, I took Jed's advice and recompiled
> petsc-maint but forgot to relink my app so please discard the time spent
> in MatView again) and GASM (best, really ? for a Poisson equation ?).
> 
> I'll try bigger matrices (that is likely the real problem now, at least
> for GAMG), but if you still see something fishy that I might need to
> adjust in the parameters, please tell me.
> 
> Also, the first results I got for elasticity (before going back to plain
> scalar diffusion) were at least as bad. Do you have any tips for such
> problems beside setting the correct BlockSize and MatNearNullSpace and
> using parameters similar to the ones you just gave me or the ones that can
> be found here
> http://lists.mcs.anl.gov/pipermail/petsc-users/2012-April/012790.html ?
> 
> Thanks for your help,
> Pierre
> 
>> 
>> On Oct 8, 2013, at 8:18 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>> 
>>> MatView                6 1.0 3.4042e+01269.1 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 4.0e+00 18  0  0  0  0  25  0  0  0  0     0
>>> 
>>>  Something is seriously wrong with the default matview (or pcview) for
>>> PC GAMG?  It is printing way to much for the default view and thus
>>> totally hosing the timings.  The default PCView() is suppose to be
>>> very light weight (not do excessive communication) and provide very
>>> high level information.
>>> 
>> 
>> Oh, I think the problem is that GAMG sets the coarse grid solver
>> explicitly as a block jacobi with LU local.  GAMG insures that all
>> equation are on one PE for the coarsest grid.  ML uses redundant.  You
>> should be able to use redundant in GAMG, it is just not the default.  This
>> is not tested.  So I'm guessing the problem is that block Jacobi is noisy.
>> 
>>> 
>>>  Barry
>>> 
>>> 
>>> On Oct 8, 2013, at 6:50 PM, "Mark F. Adams" <mfadams at lbl.gov> wrote:
>>> 
>>>> Something is going terrible wrong with the setup in hypre and ML.
>>>> hypre's default parameters are not setup well for 3D.  I use:
>>>> 
>>>> -pc_hypre_boomeramg_no_CF
>>>> -pc_hypre_boomeramg_agg_nl 1
>>>> -pc_hypre_boomeramg_coarsen_type HMIS
>>>> -pc_hypre_boomeramg_interp_type ext+i
>>>> 
>>>> I'm not sure what is going wrong with ML's setup.
>>>> 
>>>> GAMG is converging terribly.  Is this just a simple 7-point Laplacian?
>>>> It looks like you the eigen estimate is low on the finest grid, which
>>>> messes up the smoother.  Try running with these parameters and send the
>>>> output:
>>>> 
>>>> -pc_gamg_agg_nsmooths 1
>>>> -pc_gamg_verbose 2
>>>> -mg_levels_ksp_type richardson
>>>> -mg_levels_pc_type sor
>>>> 
>>>> Mark
>>>> 
>>>> On Oct 8, 2013, at 5:46 PM, Pierre Jolivet <jolivet at ann.jussieu.fr>
>>>> wrote:
>>>> 
>>>>> Please find the log for BoomerAMG, ML and GAMG attached. The set up
>>>>> for
>>>>> GAMG doesn't look so bad compared to the other packages, so I'm
>>>>> wondering
>>>>> what is going on with those ?
>>>>> 
>>>>>> 
>>>>>> We need the output from running with -log_summary -pc_mg_log
>>>>>> 
>>>>>> Also you can run with PETSc's AMG called GAMG (run with -pc_type
>>>>>> gamg)
>>>>>> This will give the most useful information about where it is spending
>>>>>> the time.
>>>>>> 
>>>>>> 
>>>>>> Barry
>>>>>> 
>>>>>> 
>>>>>> On Oct 8, 2013, at 4:11 PM, Pierre Jolivet <jolivet at ann.jussieu.fr>
>>>>>> wrote:
>>>>>> 
>>>>>>> Dear all,
>>>>>>> I'm trying to compare linear solvers for a simple Poisson equation
>>>>>>> in
>>>>>>> 3D.
>>>>>>> I thought that MG was the way to go, but looking at my log, the
>>>>>>> performance looks abysmal (I know that the matrices are way too
>>>>>>> small
>>>>>>> but
>>>>>>> if I go bigger, it just never performs a single iteration ..). Even
>>>>>>> though
>>>>>>> this is neither the BoomerAMG nor the ML mailing list, could you
>>>>>>> please
>>>>>>> tell me if PETSc sets some default flags that make the setup for
>>>>>>> those
>>>>>>> solvers so slow for this simple problem ? The performance of (G)ASM
>>>>>>> is
>>>>>>> in
>>>>>>> comparison much better.
>>>>>>> 
>>>>>>> Thanks in advance for your help.
>>>>>>> 
>>>>>>> PS: first the BoomerAMG log, then ML (much more verbose, sorry).
>>>>>>> 
>>>>>>> 0 KSP Residual norm 1.599647112604e+00
>>>>>>> 1 KSP Residual norm 5.450838232404e-02
>>>>>>> 2 KSP Residual norm 3.549673478318e-03
>>>>>>> 3 KSP Residual norm 2.901826808841e-04
>>>>>>> 4 KSP Residual norm 2.574235778729e-05
>>>>>>> 5 KSP Residual norm 2.253410171682e-06
>>>>>>> 6 KSP Residual norm 1.871067784877e-07
>>>>>>> 7 KSP Residual norm 1.681162800670e-08
>>>>>>> 8 KSP Residual norm 2.120841512414e-09
>>>>>>> KSP Object: 2048 MPI processes
>>>>>>> type: gmres
>>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>>>>>> Orthogonalization with no iterative refinement
>>>>>>> GMRES: happy breakdown tolerance 1e-30
>>>>>>> maximum iterations=200, initial guess is zero
>>>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>>>>>> left preconditioning
>>>>>>> using PRECONDITIONED norm type for convergence test
>>>>>>> PC Object: 2048 MPI processes
>>>>>>> type: hypre
>>>>>>> HYPRE BoomerAMG preconditioning
>>>>>>> HYPRE BoomerAMG: Cycle type V
>>>>>>> HYPRE BoomerAMG: Maximum number of levels 25
>>>>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>>>> HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>>>> HYPRE BoomerAMG: Maximum row sums 0.9
>>>>>>> HYPRE BoomerAMG: Sweeps down         1
>>>>>>> HYPRE BoomerAMG: Sweeps up           1
>>>>>>> HYPRE BoomerAMG: Sweeps on coarse    1
>>>>>>> HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>>>>>> HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>>>>>> HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>>>>>> HYPRE BoomerAMG: Relax weight  (all)      1
>>>>>>> HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>>>> HYPRE BoomerAMG: Using CF-relaxation
>>>>>>> HYPRE BoomerAMG: Measure type        local
>>>>>>> HYPRE BoomerAMG: Coarsen type        Falgout
>>>>>>> HYPRE BoomerAMG: Interpolation type  classical
>>>>>>> linear system matrix = precond matrix:
>>>>>>> Matrix Object:   2048 MPI processes
>>>>>>> type: mpiaij
>>>>>>> rows=4173281, cols=4173281
>>>>>>> total: nonzeros=102576661, allocated nonzeros=102576661
>>>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>>>  not using I-node (on process 0) routines
>>>>>>> --- system solved with PETSc (in 1.005199e+02 seconds)
>>>>>>> 
>>>>>>> 0 KSP Residual norm 2.368804472986e-01
>>>>>>> 1 KSP Residual norm 5.676430019132e-02
>>>>>>> 2 KSP Residual norm 1.898005876002e-02
>>>>>>> 3 KSP Residual norm 6.193922902926e-03
>>>>>>> 4 KSP Residual norm 2.008448794493e-03
>>>>>>> 5 KSP Residual norm 6.390465670228e-04
>>>>>>> 6 KSP Residual norm 2.157709394389e-04
>>>>>>> 7 KSP Residual norm 7.295973819979e-05
>>>>>>> 8 KSP Residual norm 2.358343271482e-05
>>>>>>> 9 KSP Residual norm 7.489696222066e-06
>>>>>>> 10 KSP Residual norm 2.390946857593e-06
>>>>>>> 11 KSP Residual norm 8.068086385140e-07
>>>>>>> 12 KSP Residual norm 2.706607789749e-07
>>>>>>> 13 KSP Residual norm 8.636910863376e-08
>>>>>>> 14 KSP Residual norm 2.761981175852e-08
>>>>>>> 15 KSP Residual norm 8.755459874369e-09
>>>>>>> 16 KSP Residual norm 2.708848598341e-09
>>>>>>> 17 KSP Residual norm 8.968748876265e-10
>>>>>>> KSP Object: 2048 MPI processes
>>>>>>> type: gmres
>>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>>>>>> Orthogonalization with no iterative refinement
>>>>>>> GMRES: happy breakdown tolerance 1e-30
>>>>>>> maximum iterations=200, initial guess is zero
>>>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>>>>>> left preconditioning
>>>>>>> using PRECONDITIONED norm type for convergence test
>>>>>>> PC Object: 2048 MPI processes
>>>>>>> type: ml
>>>>>>> MG: type is MULTIPLICATIVE, levels=3 cycles=v
>>>>>>>  Cycles per PCApply=1
>>>>>>>  Using Galerkin computed coarse grid matrices
>>>>>>> Coarse grid solver -- level -------------------------------
>>>>>>> KSP Object:    (mg_coarse_)     2048 MPI processes
>>>>>>>  type: preonly
>>>>>>>  maximum iterations=1, initial guess is zero
>>>>>>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>>>  left preconditioning
>>>>>>>  using NONE norm type for convergence test
>>>>>>> PC Object:    (mg_coarse_)     2048 MPI processes
>>>>>>>  type: redundant
>>>>>>>    Redundant preconditioner: First (color=0) of 2048 PCs follows
>>>>>>>  KSP Object:      (mg_coarse_redundant_)       1 MPI processes
>>>>>>>    type: preonly
>>>>>>>    maximum iterations=10000, initial guess is zero
>>>>>>>    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>>>    left preconditioning
>>>>>>>    using NONE norm type for convergence test
>>>>>>>  PC Object:      (mg_coarse_redundant_)       1 MPI processes
>>>>>>>    type: lu
>>>>>>>      LU: out-of-place factorization
>>>>>>>      tolerance for zero pivot 2.22045e-14
>>>>>>>      using diagonal shift on blocks to prevent zero pivot
>>>>>>>      matrix ordering: nd
>>>>>>>      factor fill ratio given 5, needed 4.38504
>>>>>>>        Factored matrix follows:
>>>>>>>          Matrix Object:               1 MPI processes
>>>>>>>            type: seqaij
>>>>>>>            rows=2055, cols=2055
>>>>>>>            package used to perform factorization: petsc
>>>>>>>            total: nonzeros=2476747, allocated nonzeros=2476747
>>>>>>>            total number of mallocs used during MatSetValues calls
>>>>>>> =0
>>>>>>>              using I-node routines: found 1638 nodes, limit used is
>>>>>>> 5
>>>>>>>    linear system matrix = precond matrix:
>>>>>>>    Matrix Object:         1 MPI processes
>>>>>>>      type: seqaij
>>>>>>>      rows=2055, cols=2055
>>>>>>>      total: nonzeros=564817, allocated nonzeros=1093260
>>>>>>>      total number of mallocs used during MatSetValues calls =0
>>>>>>>        not using I-node routines
>>>>>>>  linear system matrix = precond matrix:
>>>>>>>  Matrix Object:       2048 MPI processes
>>>>>>>    type: mpiaij
>>>>>>>    rows=2055, cols=2055
>>>>>>>    total: nonzeros=564817, allocated nonzeros=564817
>>>>>>>    total number of mallocs used during MatSetValues calls =0
>>>>>>>      not using I-node (on process 0) routines
>>>>>>> Down solver (pre-smoother) on level 1
>>>>>>> -------------------------------
>>>>>>> KSP Object:    (mg_levels_1_)     2048 MPI processes
>>>>>>>  type: richardson
>>>>>>>    Richardson: damping factor=1
>>>>>>>  maximum iterations=2
>>>>>>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>>>  left preconditioning
>>>>>>>  using nonzero initial guess
>>>>>>>  using NONE norm type for convergence test
>>>>>>> PC Object:    (mg_levels_1_)     2048 MPI processes
>>>>>>>  type: sor
>>>>>>>    SOR: type = local_symmetric, iterations = 1, local iterations =
>>>>>>> 1,
>>>>>>> omega = 1
>>>>>>>  linear system matrix = precond matrix:
>>>>>>>  Matrix Object:       2048 MPI processes
>>>>>>>    type: mpiaij
>>>>>>>    rows=30194, cols=30194
>>>>>>>    total: nonzeros=3368414, allocated nonzeros=3368414
>>>>>>>    total number of mallocs used during MatSetValues calls =0
>>>>>>>      not using I-node (on process 0) routines
>>>>>>> Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>> Down solver (pre-smoother) on level 2
>>>>>>> -------------------------------
>>>>>>> KSP Object:    (mg_levels_2_)     2048 MPI processes
>>>>>>>  type: richardson
>>>>>>>    Richardson: damping factor=1
>>>>>>>  maximum iterations=2
>>>>>>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>>>  left preconditioning
>>>>>>>  using nonzero initial guess
>>>>>>>  using NONE norm type for convergence test
>>>>>>> PC Object:    (mg_levels_2_)     2048 MPI processes
>>>>>>>  type: sor
>>>>>>>    SOR: type = local_symmetric, iterations = 1, local iterations =
>>>>>>> 1,
>>>>>>> omega = 1
>>>>>>>  linear system matrix = precond matrix:
>>>>>>>  Matrix Object:       2048 MPI processes
>>>>>>>    type: mpiaij
>>>>>>>    rows=531441, cols=531441
>>>>>>>    total: nonzeros=12476324, allocated nonzeros=12476324
>>>>>>>    total number of mallocs used during MatSetValues calls =0
>>>>>>>      not using I-node (on process 0) routines
>>>>>>> Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>> linear system matrix = precond matrix:
>>>>>>> Matrix Object:   2048 MPI processes
>>>>>>> type: mpiaij
>>>>>>> rows=531441, cols=531441
>>>>>>> total: nonzeros=12476324, allocated nonzeros=12476324
>>>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>>>  not using I-node (on process 0) routines
>>>>>>> --- system solved with PETSc (in 2.407844e+02 seconds)
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> <log-GAMG><log-ML><log-BoomerAMG>
>>>> 
>>> 
>> 
> <log-ML><log-GASM><log-GAMG><log-BoomerAMG>