[petsc-dev] [petsc-users] Problem with AMG packages

Mark F. Adams mfadams at lbl.gov
Wed Oct 9 10:45:38 CDT 2013


On Oct 9, 2013, at 11:32 AM, Matthew Knepley <knepley at gmail.com> wrote:

> On Wed, Oct 9, 2013 at 8:34 AM, Pierre Jolivet <jolivet at ann.jussieu.fr> wrote:
> Mark and Barry,
> You will find attached the log for BoomerAMG (better, but still slow
> imho), ML (still lost), GAMG (better, I took Jed's advice and recompiled
> petsc-maint but forgot to relink my app so please discard the time spent
> in MatView again) and GASM (best, really ? for a Poisson equation ?).
> 
> I'll try bigger matrices (that is likely the real problem now, at least
> for GAMG), but if you still see something fishy that I might need to
> adjust in the parameters, please tell me.
> 
> I strongly suspect something wrong with the formulation. There are plenty of
> examples of this same thing in PETSc that work fine. SNES ex12, although it
> is an unstructured grid, can do 3d Poisson and you can run GAMG on it to
> show good linear scaling. KSP ex56 is 3d elasticity and what Mark uses to
> benchmark GAMG.

GAMG is coarsening super fast.  THis is not a big deal at this stage but I would add '-pc_gamg_square_graph false'

Also, the flop rates are terrible (2 Mflop/s/core).  Is this a 3D cube that is partitioned lexicographically?  

> 
>   Thanks,
> 
>      Matt
>  
> Also, the first results I got for elasticity (before going back to plain
> scalar diffusion) were at least as bad. Do you have any tips for such
> problems beside setting the correct BlockSize and MatNearNullSpace and
> using parameters similar to the ones you just gave me or the ones that can
> be found here
> http://lists.mcs.anl.gov/pipermail/petsc-users/2012-April/012790.html ?
> 
> Thanks for your help,
> Pierre
> 
> >
> > On Oct 8, 2013, at 8:18 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >>
> >> MatView                6 1.0 3.4042e+01269.1 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 4.0e+00 18  0  0  0  0  25  0  0  0  0     0
> >>
> >>   Something is seriously wrong with the default matview (or pcview) for
> >> PC GAMG?  It is printing way to much for the default view and thus
> >> totally hosing the timings.  The default PCView() is suppose to be
> >> very light weight (not do excessive communication) and provide very
> >> high level information.
> >>
> >
> > Oh, I think the problem is that GAMG sets the coarse grid solver
> > explicitly as a block jacobi with LU local.  GAMG insures that all
> > equation are on one PE for the coarsest grid.  ML uses redundant.  You
> > should be able to use redundant in GAMG, it is just not the default.  This
> > is not tested.  So I'm guessing the problem is that block Jacobi is noisy.
> >
> >>
> >>   Barry
> >>
> >>
> >> On Oct 8, 2013, at 6:50 PM, "Mark F. Adams" <mfadams at lbl.gov> wrote:
> >>
> >>> Something is going terrible wrong with the setup in hypre and ML.
> >>> hypre's default parameters are not setup well for 3D.  I use:
> >>>
> >>> -pc_hypre_boomeramg_no_CF
> >>> -pc_hypre_boomeramg_agg_nl 1
> >>> -pc_hypre_boomeramg_coarsen_type HMIS
> >>> -pc_hypre_boomeramg_interp_type ext+i
> >>>
> >>> I'm not sure what is going wrong with ML's setup.
> >>>
> >>> GAMG is converging terribly.  Is this just a simple 7-point Laplacian?
> >>> It looks like you the eigen estimate is low on the finest grid, which
> >>> messes up the smoother.  Try running with these parameters and send the
> >>> output:
> >>>
> >>> -pc_gamg_agg_nsmooths 1
> >>> -pc_gamg_verbose 2
> >>> -mg_levels_ksp_type richardson
> >>> -mg_levels_pc_type sor
> >>>
> >>> Mark
> >>>
> >>> On Oct 8, 2013, at 5:46 PM, Pierre Jolivet <jolivet at ann.jussieu.fr>
> >>> wrote:
> >>>
> >>>> Please find the log for BoomerAMG, ML and GAMG attached. The set up
> >>>> for
> >>>> GAMG doesn't look so bad compared to the other packages, so I'm
> >>>> wondering
> >>>> what is going on with those ?
> >>>>
> >>>>>
> >>>>> We need the output from running with -log_summary -pc_mg_log
> >>>>>
> >>>>> Also you can run with PETSc's AMG called GAMG (run with -pc_type
> >>>>> gamg)
> >>>>> This will give the most useful information about where it is spending
> >>>>> the time.
> >>>>>
> >>>>>
> >>>>> Barry
> >>>>>
> >>>>>
> >>>>> On Oct 8, 2013, at 4:11 PM, Pierre Jolivet <jolivet at ann.jussieu.fr>
> >>>>> wrote:
> >>>>>
> >>>>>> Dear all,
> >>>>>> I'm trying to compare linear solvers for a simple Poisson equation
> >>>>>> in
> >>>>>> 3D.
> >>>>>> I thought that MG was the way to go, but looking at my log, the
> >>>>>> performance looks abysmal (I know that the matrices are way too
> >>>>>> small
> >>>>>> but
> >>>>>> if I go bigger, it just never performs a single iteration ..). Even
> >>>>>> though
> >>>>>> this is neither the BoomerAMG nor the ML mailing list, could you
> >>>>>> please
> >>>>>> tell me if PETSc sets some default flags that make the setup for
> >>>>>> those
> >>>>>> solvers so slow for this simple problem ? The performance of (G)ASM
> >>>>>> is
> >>>>>> in
> >>>>>> comparison much better.
> >>>>>>
> >>>>>> Thanks in advance for your help.
> >>>>>>
> >>>>>> PS: first the BoomerAMG log, then ML (much more verbose, sorry).
> >>>>>>
> >>>>>> 0 KSP Residual norm 1.599647112604e+00
> >>>>>> 1 KSP Residual norm 5.450838232404e-02
> >>>>>> 2 KSP Residual norm 3.549673478318e-03
> >>>>>> 3 KSP Residual norm 2.901826808841e-04
> >>>>>> 4 KSP Residual norm 2.574235778729e-05
> >>>>>> 5 KSP Residual norm 2.253410171682e-06
> >>>>>> 6 KSP Residual norm 1.871067784877e-07
> >>>>>> 7 KSP Residual norm 1.681162800670e-08
> >>>>>> 8 KSP Residual norm 2.120841512414e-09
> >>>>>> KSP Object: 2048 MPI processes
> >>>>>> type: gmres
> >>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> >>>>>> Orthogonalization with no iterative refinement
> >>>>>> GMRES: happy breakdown tolerance 1e-30
> >>>>>> maximum iterations=200, initial guess is zero
> >>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
> >>>>>> left preconditioning
> >>>>>> using PRECONDITIONED norm type for convergence test
> >>>>>> PC Object: 2048 MPI processes
> >>>>>> type: hypre
> >>>>>> HYPRE BoomerAMG preconditioning
> >>>>>> HYPRE BoomerAMG: Cycle type V
> >>>>>> HYPRE BoomerAMG: Maximum number of levels 25
> >>>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
> >>>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
> >>>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25
> >>>>>> HYPRE BoomerAMG: Interpolation truncation factor 0
> >>>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0
> >>>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
> >>>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
> >>>>>> HYPRE BoomerAMG: Maximum row sums 0.9
> >>>>>> HYPRE BoomerAMG: Sweeps down         1
> >>>>>> HYPRE BoomerAMG: Sweeps up           1
> >>>>>> HYPRE BoomerAMG: Sweeps on coarse    1
> >>>>>> HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
> >>>>>> HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
> >>>>>> HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
> >>>>>> HYPRE BoomerAMG: Relax weight  (all)      1
> >>>>>> HYPRE BoomerAMG: Outer relax weight (all) 1
> >>>>>> HYPRE BoomerAMG: Using CF-relaxation
> >>>>>> HYPRE BoomerAMG: Measure type        local
> >>>>>> HYPRE BoomerAMG: Coarsen type        Falgout
> >>>>>> HYPRE BoomerAMG: Interpolation type  classical
> >>>>>> linear system matrix = precond matrix:
> >>>>>> Matrix Object:   2048 MPI processes
> >>>>>> type: mpiaij
> >>>>>> rows=4173281, cols=4173281
> >>>>>> total: nonzeros=102576661, allocated nonzeros=102576661
> >>>>>> total number of mallocs used during MatSetValues calls =0
> >>>>>>   not using I-node (on process 0) routines
> >>>>>> --- system solved with PETSc (in 1.005199e+02 seconds)
> >>>>>>
> >>>>>> 0 KSP Residual norm 2.368804472986e-01
> >>>>>> 1 KSP Residual norm 5.676430019132e-02
> >>>>>> 2 KSP Residual norm 1.898005876002e-02
> >>>>>> 3 KSP Residual norm 6.193922902926e-03
> >>>>>> 4 KSP Residual norm 2.008448794493e-03
> >>>>>> 5 KSP Residual norm 6.390465670228e-04
> >>>>>> 6 KSP Residual norm 2.157709394389e-04
> >>>>>> 7 KSP Residual norm 7.295973819979e-05
> >>>>>> 8 KSP Residual norm 2.358343271482e-05
> >>>>>> 9 KSP Residual norm 7.489696222066e-06
> >>>>>> 10 KSP Residual norm 2.390946857593e-06
> >>>>>> 11 KSP Residual norm 8.068086385140e-07
> >>>>>> 12 KSP Residual norm 2.706607789749e-07
> >>>>>> 13 KSP Residual norm 8.636910863376e-08
> >>>>>> 14 KSP Residual norm 2.761981175852e-08
> >>>>>> 15 KSP Residual norm 8.755459874369e-09
> >>>>>> 16 KSP Residual norm 2.708848598341e-09
> >>>>>> 17 KSP Residual norm 8.968748876265e-10
> >>>>>> KSP Object: 2048 MPI processes
> >>>>>> type: gmres
> >>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> >>>>>> Orthogonalization with no iterative refinement
> >>>>>> GMRES: happy breakdown tolerance 1e-30
> >>>>>> maximum iterations=200, initial guess is zero
> >>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
> >>>>>> left preconditioning
> >>>>>> using PRECONDITIONED norm type for convergence test
> >>>>>> PC Object: 2048 MPI processes
> >>>>>> type: ml
> >>>>>> MG: type is MULTIPLICATIVE, levels=3 cycles=v
> >>>>>>   Cycles per PCApply=1
> >>>>>>   Using Galerkin computed coarse grid matrices
> >>>>>> Coarse grid solver -- level -------------------------------
> >>>>>> KSP Object:    (mg_coarse_)     2048 MPI processes
> >>>>>>   type: preonly
> >>>>>>   maximum iterations=1, initial guess is zero
> >>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >>>>>>   left preconditioning
> >>>>>>   using NONE norm type for convergence test
> >>>>>> PC Object:    (mg_coarse_)     2048 MPI processes
> >>>>>>   type: redundant
> >>>>>>     Redundant preconditioner: First (color=0) of 2048 PCs follows
> >>>>>>   KSP Object:      (mg_coarse_redundant_)       1 MPI processes
> >>>>>>     type: preonly
> >>>>>>     maximum iterations=10000, initial guess is zero
> >>>>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >>>>>>     left preconditioning
> >>>>>>     using NONE norm type for convergence test
> >>>>>>   PC Object:      (mg_coarse_redundant_)       1 MPI processes
> >>>>>>     type: lu
> >>>>>>       LU: out-of-place factorization
> >>>>>>       tolerance for zero pivot 2.22045e-14
> >>>>>>       using diagonal shift on blocks to prevent zero pivot
> >>>>>>       matrix ordering: nd
> >>>>>>       factor fill ratio given 5, needed 4.38504
> >>>>>>         Factored matrix follows:
> >>>>>>           Matrix Object:               1 MPI processes
> >>>>>>             type: seqaij
> >>>>>>             rows=2055, cols=2055
> >>>>>>             package used to perform factorization: petsc
> >>>>>>             total: nonzeros=2476747, allocated nonzeros=2476747
> >>>>>>             total number of mallocs used during MatSetValues calls
> >>>>>> =0
> >>>>>>               using I-node routines: found 1638 nodes, limit used is
> >>>>>> 5
> >>>>>>     linear system matrix = precond matrix:
> >>>>>>     Matrix Object:         1 MPI processes
> >>>>>>       type: seqaij
> >>>>>>       rows=2055, cols=2055
> >>>>>>       total: nonzeros=564817, allocated nonzeros=1093260
> >>>>>>       total number of mallocs used during MatSetValues calls =0
> >>>>>>         not using I-node routines
> >>>>>>   linear system matrix = precond matrix:
> >>>>>>   Matrix Object:       2048 MPI processes
> >>>>>>     type: mpiaij
> >>>>>>     rows=2055, cols=2055
> >>>>>>     total: nonzeros=564817, allocated nonzeros=564817
> >>>>>>     total number of mallocs used during MatSetValues calls =0
> >>>>>>       not using I-node (on process 0) routines
> >>>>>> Down solver (pre-smoother) on level 1
> >>>>>> -------------------------------
> >>>>>> KSP Object:    (mg_levels_1_)     2048 MPI processes
> >>>>>>   type: richardson
> >>>>>>     Richardson: damping factor=1
> >>>>>>   maximum iterations=2
> >>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >>>>>>   left preconditioning
> >>>>>>   using nonzero initial guess
> >>>>>>   using NONE norm type for convergence test
> >>>>>> PC Object:    (mg_levels_1_)     2048 MPI processes
> >>>>>>   type: sor
> >>>>>>     SOR: type = local_symmetric, iterations = 1, local iterations =
> >>>>>> 1,
> >>>>>> omega = 1
> >>>>>>   linear system matrix = precond matrix:
> >>>>>>   Matrix Object:       2048 MPI processes
> >>>>>>     type: mpiaij
> >>>>>>     rows=30194, cols=30194
> >>>>>>     total: nonzeros=3368414, allocated nonzeros=3368414
> >>>>>>     total number of mallocs used during MatSetValues calls =0
> >>>>>>       not using I-node (on process 0) routines
> >>>>>> Up solver (post-smoother) same as down solver (pre-smoother)
> >>>>>> Down solver (pre-smoother) on level 2
> >>>>>> -------------------------------
> >>>>>> KSP Object:    (mg_levels_2_)     2048 MPI processes
> >>>>>>   type: richardson
> >>>>>>     Richardson: damping factor=1
> >>>>>>   maximum iterations=2
> >>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >>>>>>   left preconditioning
> >>>>>>   using nonzero initial guess
> >>>>>>   using NONE norm type for convergence test
> >>>>>> PC Object:    (mg_levels_2_)     2048 MPI processes
> >>>>>>   type: sor
> >>>>>>     SOR: type = local_symmetric, iterations = 1, local iterations =
> >>>>>> 1,
> >>>>>> omega = 1
> >>>>>>   linear system matrix = precond matrix:
> >>>>>>   Matrix Object:       2048 MPI processes
> >>>>>>     type: mpiaij
> >>>>>>     rows=531441, cols=531441
> >>>>>>     total: nonzeros=12476324, allocated nonzeros=12476324
> >>>>>>     total number of mallocs used during MatSetValues calls =0
> >>>>>>       not using I-node (on process 0) routines
> >>>>>> Up solver (post-smoother) same as down solver (pre-smoother)
> >>>>>> linear system matrix = precond matrix:
> >>>>>> Matrix Object:   2048 MPI processes
> >>>>>> type: mpiaij
> >>>>>> rows=531441, cols=531441
> >>>>>> total: nonzeros=12476324, allocated nonzeros=12476324
> >>>>>> total number of mallocs used during MatSetValues calls =0
> >>>>>>   not using I-node (on process 0) routines
> >>>>>> --- system solved with PETSc (in 2.407844e+02 seconds)
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>> <log-GAMG><log-ML><log-BoomerAMG>
> >>>
> >>
> >
> >
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20131009/fbb3cbbe/attachment.html>


More information about the petsc-dev mailing list