[petsc-dev] [petsc-users] Problem with AMG packages

Wed Oct 9 17:26:06 CDT 2013

>
> On Oct 9, 2013, at 11:32 AM, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Wed, Oct 9, 2013 at 8:34 AM, Pierre Jolivet <jolivet at ann.jussieu.fr>
>> wrote:
>> Mark and Barry,
>> You will find attached the log for BoomerAMG (better, but still slow
>> imho), ML (still lost), GAMG (better, I took Jed's advice and recompiled
>> petsc-maint but forgot to relink my app so please discard the time spent
>> in MatView again) and GASM (best, really ? for a Poisson equation ?).
>>
>> I'll try bigger matrices (that is likely the real problem now, at least
>> for GAMG), but if you still see something fishy that I might need to
>> adjust in the parameters, please tell me.
>>
>> I strongly suspect something wrong with the formulation. There are
>> plenty of
>> examples of this same thing in PETSc that work fine. SNES ex12, although
>> it
>> is an unstructured grid, can do 3d Poisson and you can run GAMG on it to
>> show good linear scaling. KSP ex56 is 3d elasticity and what Mark uses
>> to
>> benchmark GAMG.
>
> GAMG is coarsening super fast.  THis is not a big deal at this stage but I
> would add '-pc_gamg_square_graph false'
>
> Also, the flop rates are terrible (2 Mflop/s/core).  Is this a 3D cube
> that is partitioned lexicographically?

(referring to your previous email about the kind of discretization) I'm
using P_2 finite elements on tetras. The mesh is so small that it might as
well be partitioned lexicographically (but in this case it was done by
Metis)

Now that at least either GAMG or BoomerAMG are not stuck for small
matrices I'll try to go bigger, and hopefully won't have to bother you
again, thanks a lot for all your pointers.

Pierre

PS: about src/ksp/ksp/examples/tutorials/ex56.c, Dirichelet (line 6)
should read Dirichlet.

>>
>>   Thanks,
>>
>>      Matt
>>
>> Also, the first results I got for elasticity (before going back to plain
>> scalar diffusion) were at least as bad. Do you have any tips for such
>> problems beside setting the correct BlockSize and MatNearNullSpace and
>> using parameters similar to the ones you just gave me or the ones that
>> can
>> be found here
>> http://lists.mcs.anl.gov/pipermail/petsc-users/2012-April/012790.html ?
>>
>> Thanks for your help,
>> Pierre
>>
>> >
>> > On Oct 8, 2013, at 8:18 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >
>> >>
>> >> MatView                6 1.0 3.4042e+01269.1 0.00e+00 0.0 0.0e+00
>> >> 0.0e+00 4.0e+00 18  0  0  0  0  25  0  0  0  0     0
>> >>
>> >>   Something is seriously wrong with the default matview (or pcview)
>> for
>> >> PC GAMG?  It is printing way to much for the default view and thus
>> >> totally hosing the timings.  The default PCView() is suppose to be
>> >> very light weight (not do excessive communication) and provide very
>> >> high level information.
>> >>
>> >
>> > Oh, I think the problem is that GAMG sets the coarse grid solver
>> > explicitly as a block jacobi with LU local.  GAMG insures that all
>> > equation are on one PE for the coarsest grid.  ML uses redundant.  You
>> > should be able to use redundant in GAMG, it is just not the default.
>> This
>> > is not tested.  So I'm guessing the problem is that block Jacobi is
>> noisy.
>> >
>> >>
>> >>   Barry
>> >>
>> >>
>> >> On Oct 8, 2013, at 6:50 PM, "Mark F. Adams" <mfadams at lbl.gov> wrote:
>> >>
>> >>> Something is going terrible wrong with the setup in hypre and ML.
>> >>> hypre's default parameters are not setup well for 3D.  I use:
>> >>>
>> >>> -pc_hypre_boomeramg_no_CF
>> >>> -pc_hypre_boomeramg_agg_nl 1
>> >>> -pc_hypre_boomeramg_coarsen_type HMIS
>> >>> -pc_hypre_boomeramg_interp_type ext+i
>> >>>
>> >>> I'm not sure what is going wrong with ML's setup.
>> >>>
>> >>> GAMG is converging terribly.  Is this just a simple 7-point
>> Laplacian?
>> >>> It looks like you the eigen estimate is low on the finest grid,
>> which
>> >>> messes up the smoother.  Try running with these parameters and send
>> the
>> >>> output:
>> >>>
>> >>> -pc_gamg_agg_nsmooths 1
>> >>> -pc_gamg_verbose 2
>> >>> -mg_levels_ksp_type richardson
>> >>> -mg_levels_pc_type sor
>> >>>
>> >>> Mark
>> >>>
>> >>> On Oct 8, 2013, at 5:46 PM, Pierre Jolivet <jolivet at ann.jussieu.fr>
>> >>> wrote:
>> >>>
>> >>>> Please find the log for BoomerAMG, ML and GAMG attached. The set up
>> >>>> for
>> >>>> GAMG doesn't look so bad compared to the other packages, so I'm
>> >>>> wondering
>> >>>> what is going on with those ?
>> >>>>
>> >>>>>
>> >>>>> We need the output from running with -log_summary -pc_mg_log
>> >>>>>
>> >>>>> Also you can run with PETSc's AMG called GAMG (run with -pc_type
>> >>>>> gamg)
>> >>>>> This will give the most useful information about where it is
>> spending
>> >>>>> the time.
>> >>>>>
>> >>>>>
>> >>>>> Barry
>> >>>>>
>> >>>>>
>> >>>>> On Oct 8, 2013, at 4:11 PM, Pierre Jolivet
>> <jolivet at ann.jussieu.fr>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Dear all,
>> >>>>>> I'm trying to compare linear solvers for a simple Poisson
>> equation
>> >>>>>> in
>> >>>>>> 3D.
>> >>>>>> I thought that MG was the way to go, but looking at my log, the
>> >>>>>> performance looks abysmal (I know that the matrices are way too
>> >>>>>> small
>> >>>>>> but
>> >>>>>> if I go bigger, it just never performs a single iteration ..).
>> Even
>> >>>>>> though
>> >>>>>> this is neither the BoomerAMG nor the ML mailing list, could you
>> >>>>>> please
>> >>>>>> tell me if PETSc sets some default flags that make the setup for
>> >>>>>> those
>> >>>>>> solvers so slow for this simple problem ? The performance of
>> (G)ASM
>> >>>>>> is
>> >>>>>> in
>> >>>>>> comparison much better.
>> >>>>>>
>> >>>>>> Thanks in advance for your help.
>> >>>>>>
>> >>>>>> PS: first the BoomerAMG log, then ML (much more verbose, sorry).
>> >>>>>>
>> >>>>>> 0 KSP Residual norm 1.599647112604e+00
>> >>>>>> 1 KSP Residual norm 5.450838232404e-02
>> >>>>>> 2 KSP Residual norm 3.549673478318e-03
>> >>>>>> 3 KSP Residual norm 2.901826808841e-04
>> >>>>>> 4 KSP Residual norm 2.574235778729e-05
>> >>>>>> 5 KSP Residual norm 2.253410171682e-06
>> >>>>>> 6 KSP Residual norm 1.871067784877e-07
>> >>>>>> 7 KSP Residual norm 1.681162800670e-08
>> >>>>>> 8 KSP Residual norm 2.120841512414e-09
>> >>>>>> KSP Object: 2048 MPI processes
>> >>>>>> type: gmres
>> >>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> >>>>>> Orthogonalization with no iterative refinement
>> >>>>>> GMRES: happy breakdown tolerance 1e-30
>> >>>>>> maximum iterations=200, initial guess is zero
>> >>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>> >>>>>> left preconditioning
>> >>>>>> using PRECONDITIONED norm type for convergence test
>> >>>>>> PC Object: 2048 MPI processes
>> >>>>>> type: hypre
>> >>>>>> HYPRE BoomerAMG preconditioning
>> >>>>>> HYPRE BoomerAMG: Cycle type V
>> >>>>>> HYPRE BoomerAMG: Maximum number of levels 25
>> >>>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>> >>>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>> >>>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25
>> >>>>>> HYPRE BoomerAMG: Interpolation truncation factor 0
>> >>>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0
>> >>>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>> >>>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>> >>>>>> HYPRE BoomerAMG: Maximum row sums 0.9
>> >>>>>> HYPRE BoomerAMG: Sweeps down         1
>> >>>>>> HYPRE BoomerAMG: Sweeps up           1
>> >>>>>> HYPRE BoomerAMG: Sweeps on coarse    1
>> >>>>>> HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>> >>>>>> HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>> >>>>>> HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>> >>>>>> HYPRE BoomerAMG: Relax weight  (all)      1
>> >>>>>> HYPRE BoomerAMG: Outer relax weight (all) 1
>> >>>>>> HYPRE BoomerAMG: Using CF-relaxation
>> >>>>>> HYPRE BoomerAMG: Measure type        local
>> >>>>>> HYPRE BoomerAMG: Coarsen type        Falgout
>> >>>>>> HYPRE BoomerAMG: Interpolation type  classical
>> >>>>>> linear system matrix = precond matrix:
>> >>>>>> Matrix Object:   2048 MPI processes
>> >>>>>> type: mpiaij
>> >>>>>> rows=4173281, cols=4173281
>> >>>>>> total: nonzeros=102576661, allocated nonzeros=102576661
>> >>>>>> total number of mallocs used during MatSetValues calls =0
>> >>>>>>   not using I-node (on process 0) routines
>> >>>>>> --- system solved with PETSc (in 1.005199e+02 seconds)
>> >>>>>>
>> >>>>>> 0 KSP Residual norm 2.368804472986e-01
>> >>>>>> 1 KSP Residual norm 5.676430019132e-02
>> >>>>>> 2 KSP Residual norm 1.898005876002e-02
>> >>>>>> 3 KSP Residual norm 6.193922902926e-03
>> >>>>>> 4 KSP Residual norm 2.008448794493e-03
>> >>>>>> 5 KSP Residual norm 6.390465670228e-04
>> >>>>>> 6 KSP Residual norm 2.157709394389e-04
>> >>>>>> 7 KSP Residual norm 7.295973819979e-05
>> >>>>>> 8 KSP Residual norm 2.358343271482e-05
>> >>>>>> 9 KSP Residual norm 7.489696222066e-06
>> >>>>>> 10 KSP Residual norm 2.390946857593e-06
>> >>>>>> 11 KSP Residual norm 8.068086385140e-07
>> >>>>>> 12 KSP Residual norm 2.706607789749e-07
>> >>>>>> 13 KSP Residual norm 8.636910863376e-08
>> >>>>>> 14 KSP Residual norm 2.761981175852e-08
>> >>>>>> 15 KSP Residual norm 8.755459874369e-09
>> >>>>>> 16 KSP Residual norm 2.708848598341e-09
>> >>>>>> 17 KSP Residual norm 8.968748876265e-10
>> >>>>>> KSP Object: 2048 MPI processes
>> >>>>>> type: gmres
>> >>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> >>>>>> Orthogonalization with no iterative refinement
>> >>>>>> GMRES: happy breakdown tolerance 1e-30
>> >>>>>> maximum iterations=200, initial guess is zero
>> >>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>> >>>>>> left preconditioning
>> >>>>>> using PRECONDITIONED norm type for convergence test
>> >>>>>> PC Object: 2048 MPI processes
>> >>>>>> type: ml
>> >>>>>> MG: type is MULTIPLICATIVE, levels=3 cycles=v
>> >>>>>>   Cycles per PCApply=1
>> >>>>>>   Using Galerkin computed coarse grid matrices
>> >>>>>> Coarse grid solver -- level -------------------------------
>> >>>>>> KSP Object:    (mg_coarse_)     2048 MPI processes
>> >>>>>>   type: preonly
>> >>>>>>   maximum iterations=1, initial guess is zero
>> >>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>> >>>>>>   left preconditioning
>> >>>>>>   using NONE norm type for convergence test
>> >>>>>> PC Object:    (mg_coarse_)     2048 MPI processes
>> >>>>>>   type: redundant
>> >>>>>>     Redundant preconditioner: First (color=0) of 2048 PCs follows
>> >>>>>>   KSP Object:      (mg_coarse_redundant_)       1 MPI processes
>> >>>>>>     type: preonly
>> >>>>>>     maximum iterations=10000, initial guess is zero
>> >>>>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>> >>>>>>     left preconditioning
>> >>>>>>     using NONE norm type for convergence test
>> >>>>>>   PC Object:      (mg_coarse_redundant_)       1 MPI processes
>> >>>>>>     type: lu
>> >>>>>>       LU: out-of-place factorization
>> >>>>>>       tolerance for zero pivot 2.22045e-14
>> >>>>>>       using diagonal shift on blocks to prevent zero pivot
>> >>>>>>       matrix ordering: nd
>> >>>>>>       factor fill ratio given 5, needed 4.38504
>> >>>>>>         Factored matrix follows:
>> >>>>>>           Matrix Object:               1 MPI processes
>> >>>>>>             type: seqaij
>> >>>>>>             rows=2055, cols=2055
>> >>>>>>             package used to perform factorization: petsc
>> >>>>>>             total: nonzeros=2476747, allocated nonzeros=2476747
>> >>>>>>             total number of mallocs used during MatSetValues
>> calls
>> >>>>>> =0
>> >>>>>>               using I-node routines: found 1638 nodes, limit used
>> is
>> >>>>>> 5
>> >>>>>>     linear system matrix = precond matrix:
>> >>>>>>     Matrix Object:         1 MPI processes
>> >>>>>>       type: seqaij
>> >>>>>>       rows=2055, cols=2055
>> >>>>>>       total: nonzeros=564817, allocated nonzeros=1093260
>> >>>>>>       total number of mallocs used during MatSetValues calls =0
>> >>>>>>         not using I-node routines
>> >>>>>>   linear system matrix = precond matrix:
>> >>>>>>   Matrix Object:       2048 MPI processes
>> >>>>>>     type: mpiaij
>> >>>>>>     rows=2055, cols=2055
>> >>>>>>     total: nonzeros=564817, allocated nonzeros=564817
>> >>>>>>     total number of mallocs used during MatSetValues calls =0
>> >>>>>>       not using I-node (on process 0) routines
>> >>>>>> Down solver (pre-smoother) on level 1
>> >>>>>> -------------------------------
>> >>>>>> KSP Object:    (mg_levels_1_)     2048 MPI processes
>> >>>>>>   type: richardson
>> >>>>>>     Richardson: damping factor=1
>> >>>>>>   maximum iterations=2
>> >>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>> >>>>>>   left preconditioning
>> >>>>>>   using nonzero initial guess
>> >>>>>>   using NONE norm type for convergence test
>> >>>>>> PC Object:    (mg_levels_1_)     2048 MPI processes
>> >>>>>>   type: sor
>> >>>>>>     SOR: type = local_symmetric, iterations = 1, local iterations
>> =
>> >>>>>> 1,
>> >>>>>> omega = 1
>> >>>>>>   linear system matrix = precond matrix:
>> >>>>>>   Matrix Object:       2048 MPI processes
>> >>>>>>     type: mpiaij
>> >>>>>>     rows=30194, cols=30194
>> >>>>>>     total: nonzeros=3368414, allocated nonzeros=3368414
>> >>>>>>     total number of mallocs used during MatSetValues calls =0
>> >>>>>>       not using I-node (on process 0) routines
>> >>>>>> Up solver (post-smoother) same as down solver (pre-smoother)
>> >>>>>> Down solver (pre-smoother) on level 2
>> >>>>>> -------------------------------
>> >>>>>> KSP Object:    (mg_levels_2_)     2048 MPI processes
>> >>>>>>   type: richardson
>> >>>>>>     Richardson: damping factor=1
>> >>>>>>   maximum iterations=2
>> >>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>> >>>>>>   left preconditioning
>> >>>>>>   using nonzero initial guess
>> >>>>>>   using NONE norm type for convergence test
>> >>>>>> PC Object:    (mg_levels_2_)     2048 MPI processes
>> >>>>>>   type: sor
>> >>>>>>     SOR: type = local_symmetric, iterations = 1, local iterations
>> =
>> >>>>>> 1,
>> >>>>>> omega = 1
>> >>>>>>   linear system matrix = precond matrix:
>> >>>>>>   Matrix Object:       2048 MPI processes
>> >>>>>>     type: mpiaij
>> >>>>>>     rows=531441, cols=531441
>> >>>>>>     total: nonzeros=12476324, allocated nonzeros=12476324
>> >>>>>>     total number of mallocs used during MatSetValues calls =0
>> >>>>>>       not using I-node (on process 0) routines
>> >>>>>> Up solver (post-smoother) same as down solver (pre-smoother)
>> >>>>>> linear system matrix = precond matrix:
>> >>>>>> Matrix Object:   2048 MPI processes
>> >>>>>> type: mpiaij
>> >>>>>> rows=531441, cols=531441
>> >>>>>> total: nonzeros=12476324, allocated nonzeros=12476324
>> >>>>>> total number of mallocs used during MatSetValues calls =0
>> >>>>>>   not using I-node (on process 0) routines
>> >>>>>> --- system solved with PETSc (in 2.407844e+02 seconds)
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>> <log-GAMG><log-ML><log-BoomerAMG>
>> >>>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>
>