<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Oct 9, 2013, at 11:32 AM, Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">On Wed, Oct 9, 2013 at 8:34 AM, Pierre Jolivet <span dir="ltr"><<a href="mailto:jolivet@ann.jussieu.fr" target="_blank">jolivet@ann.jussieu.fr</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Mark and Barry,<br>

You will find attached the log for BoomerAMG (better, but still slow<br>

imho), ML (still lost), GAMG (better, I took Jed's advice and recompiled<br>

petsc-maint but forgot to relink my app so please discard the time spent<br>

in MatView again) and GASM (best, really ? for a Poisson equation ?).<br>

<br>

I'll try bigger matrices (that is likely the real problem now, at least<br>

for GAMG), but if you still see something fishy that I might need to<br>

adjust in the parameters, please tell me.<br></blockquote><div><br></div><div>I strongly suspect something wrong with the formulation. There are plenty of</div><div>examples of this same thing in PETSc that work fine. SNES ex12, although it</div>

<div>is an unstructured grid, can do 3d Poisson and you can run GAMG on it to</div><div>show good linear scaling. KSP ex56 is 3d elasticity and what Mark uses to</div><div>benchmark GAMG.</div></div></div></div></blockquote><div><br></div><div>GAMG is coarsening super fast.  THis is not a big deal at this stage but I would add '-pc_gamg_square_graph false'</div><div><br></div><div>Also, the flop rates are terrible (2 Mflop/s/core).  Is this a 3D cube that is partitioned lexicographically?  </div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>  Thanks,</div>

<div><br></div><div>     Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Also, the first results I got for elasticity (before going back to plain<br>

scalar diffusion) were at least as bad. Do you have any tips for such<br>

problems beside setting the correct BlockSize and MatNearNullSpace and<br>

using parameters similar to the ones you just gave me or the ones that can<br>

be found here<br>

<a href="http://lists.mcs.anl.gov/pipermail/petsc-users/2012-April/012790.html" target="_blank">http://lists.mcs.anl.gov/pipermail/petsc-users/2012-April/012790.html</a> ?<br>

<br>

Thanks for your help,<br>

Pierre<br>

<br>

><br>

> On Oct 8, 2013, at 8:18 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>

><br>

>><br>

>> MatView                6 1.0 3.4042e+01269.1 0.00e+00 0.0 0.0e+00<br>

>> 0.0e+00 4.0e+00 18  0  0  0  0  25  0  0  0  0     0<br>

>><br>

>>   Something is seriously wrong with the default matview (or pcview) for<br>

>> PC GAMG?  It is printing way to much for the default view and thus<br>

>> totally hosing the timings.  The default PCView() is suppose to be<br>

>> very light weight (not do excessive communication) and provide very<br>

>> high level information.<br>

>><br>

><br>

> Oh, I think the problem is that GAMG sets the coarse grid solver<br>

> explicitly as a block jacobi with LU local.  GAMG insures that all<br>

> equation are on one PE for the coarsest grid.  ML uses redundant.  You<br>

> should be able to use redundant in GAMG, it is just not the default.  This<br>

> is not tested.  So I'm guessing the problem is that block Jacobi is noisy.<br>

><br>

>><br>

>>   Barry<br>

>><br>

>><br>

>> On Oct 8, 2013, at 6:50 PM, "Mark F. Adams" <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br>

>><br>

>>> Something is going terrible wrong with the setup in hypre and ML.<br>

>>> hypre's default parameters are not setup well for 3D.  I use:<br>

>>><br>

>>> -pc_hypre_boomeramg_no_CF<br>

>>> -pc_hypre_boomeramg_agg_nl 1<br>

>>> -pc_hypre_boomeramg_coarsen_type HMIS<br>

>>> -pc_hypre_boomeramg_interp_type ext+i<br>

>>><br>

>>> I'm not sure what is going wrong with ML's setup.<br>

>>><br>

>>> GAMG is converging terribly.  Is this just a simple 7-point Laplacian?<br>

>>> It looks like you the eigen estimate is low on the finest grid, which<br>

>>> messes up the smoother.  Try running with these parameters and send the<br>

>>> output:<br>

>>><br>

>>> -pc_gamg_agg_nsmooths 1<br>

>>> -pc_gamg_verbose 2<br>

>>> -mg_levels_ksp_type richardson<br>

>>> -mg_levels_pc_type sor<br>

>>><br>

>>> Mark<br>

>>><br>

>>> On Oct 8, 2013, at 5:46 PM, Pierre Jolivet <<a href="mailto:jolivet@ann.jussieu.fr">jolivet@ann.jussieu.fr</a>><br>

>>> wrote:<br>

>>><br>

>>>> Please find the log for BoomerAMG, ML and GAMG attached. The set up<br>

>>>> for<br>

>>>> GAMG doesn't look so bad compared to the other packages, so I'm<br>

>>>> wondering<br>

>>>> what is going on with those ?<br>

>>>><br>

>>>>><br>

>>>>> We need the output from running with -log_summary -pc_mg_log<br>

>>>>><br>

>>>>> Also you can run with PETSc's AMG called GAMG (run with -pc_type<br>

>>>>> gamg)<br>

>>>>> This will give the most useful information about where it is spending<br>

>>>>> the time.<br>

>>>>><br>

>>>>><br>

>>>>> Barry<br>

>>>>><br>

>>>>><br>

>>>>> On Oct 8, 2013, at 4:11 PM, Pierre Jolivet <<a href="mailto:jolivet@ann.jussieu.fr">jolivet@ann.jussieu.fr</a>><br>

>>>>> wrote:<br>

>>>>><br>

>>>>>> Dear all,<br>

>>>>>> I'm trying to compare linear solvers for a simple Poisson equation<br>

>>>>>> in<br>

>>>>>> 3D.<br>

>>>>>> I thought that MG was the way to go, but looking at my log, the<br>

>>>>>> performance looks abysmal (I know that the matrices are way too<br>

>>>>>> small<br>

>>>>>> but<br>

>>>>>> if I go bigger, it just never performs a single iteration ..). Even<br>

>>>>>> though<br>

>>>>>> this is neither the BoomerAMG nor the ML mailing list, could you<br>

>>>>>> please<br>

>>>>>> tell me if PETSc sets some default flags that make the setup for<br>

>>>>>> those<br>

>>>>>> solvers so slow for this simple problem ? The performance of (G)ASM<br>

>>>>>> is<br>

>>>>>> in<br>

>>>>>> comparison much better.<br>

>>>>>><br>

>>>>>> Thanks in advance for your help.<br>

>>>>>><br>

>>>>>> PS: first the BoomerAMG log, then ML (much more verbose, sorry).<br>

>>>>>><br>

>>>>>> 0 KSP Residual norm 1.599647112604e+00<br>

>>>>>> 1 KSP Residual norm 5.450838232404e-02<br>

>>>>>> 2 KSP Residual norm 3.549673478318e-03<br>

>>>>>> 3 KSP Residual norm 2.901826808841e-04<br>

>>>>>> 4 KSP Residual norm 2.574235778729e-05<br>

>>>>>> 5 KSP Residual norm 2.253410171682e-06<br>

>>>>>> 6 KSP Residual norm 1.871067784877e-07<br>

>>>>>> 7 KSP Residual norm 1.681162800670e-08<br>

>>>>>> 8 KSP Residual norm 2.120841512414e-09<br>

>>>>>> KSP Object: 2048 MPI processes<br>

>>>>>> type: gmres<br>

>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt<br>

>>>>>> Orthogonalization with no iterative refinement<br>

>>>>>> GMRES: happy breakdown tolerance 1e-30<br>

>>>>>> maximum iterations=200, initial guess is zero<br>

>>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000<br>

>>>>>> left preconditioning<br>

>>>>>> using PRECONDITIONED norm type for convergence test<br>

>>>>>> PC Object: 2048 MPI processes<br>

>>>>>> type: hypre<br>

>>>>>> HYPRE BoomerAMG preconditioning<br>

>>>>>> HYPRE BoomerAMG: Cycle type V<br>

>>>>>> HYPRE BoomerAMG: Maximum number of levels 25<br>

>>>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1<br>

>>>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0<br>

>>>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25<br>

>>>>>> HYPRE BoomerAMG: Interpolation truncation factor 0<br>

>>>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0<br>

>>>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0<br>

>>>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1<br>

>>>>>> HYPRE BoomerAMG: Maximum row sums 0.9<br>

>>>>>> HYPRE BoomerAMG: Sweeps down         1<br>

>>>>>> HYPRE BoomerAMG: Sweeps up           1<br>

>>>>>> HYPRE BoomerAMG: Sweeps on coarse    1<br>

>>>>>> HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi<br>

>>>>>> HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi<br>

>>>>>> HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination<br>

>>>>>> HYPRE BoomerAMG: Relax weight  (all)      1<br>

>>>>>> HYPRE BoomerAMG: Outer relax weight (all) 1<br>

>>>>>> HYPRE BoomerAMG: Using CF-relaxation<br>

>>>>>> HYPRE BoomerAMG: Measure type        local<br>

>>>>>> HYPRE BoomerAMG: Coarsen type        Falgout<br>

>>>>>> HYPRE BoomerAMG: Interpolation type  classical<br>

>>>>>> linear system matrix = precond matrix:<br>

>>>>>> Matrix Object:   2048 MPI processes<br>

>>>>>> type: mpiaij<br>

>>>>>> rows=4173281, cols=4173281<br>

>>>>>> total: nonzeros=102576661, allocated nonzeros=102576661<br>

>>>>>> total number of mallocs used during MatSetValues calls =0<br>

>>>>>>   not using I-node (on process 0) routines<br>

>>>>>> --- system solved with PETSc (in 1.005199e+02 seconds)<br>

>>>>>><br>

>>>>>> 0 KSP Residual norm 2.368804472986e-01<br>

>>>>>> 1 KSP Residual norm 5.676430019132e-02<br>

>>>>>> 2 KSP Residual norm 1.898005876002e-02<br>

>>>>>> 3 KSP Residual norm 6.193922902926e-03<br>

>>>>>> 4 KSP Residual norm 2.008448794493e-03<br>

>>>>>> 5 KSP Residual norm 6.390465670228e-04<br>

>>>>>> 6 KSP Residual norm 2.157709394389e-04<br>

>>>>>> 7 KSP Residual norm 7.295973819979e-05<br>

>>>>>> 8 KSP Residual norm 2.358343271482e-05<br>

>>>>>> 9 KSP Residual norm 7.489696222066e-06<br>

>>>>>> 10 KSP Residual norm 2.390946857593e-06<br>

>>>>>> 11 KSP Residual norm 8.068086385140e-07<br>

>>>>>> 12 KSP Residual norm 2.706607789749e-07<br>

>>>>>> 13 KSP Residual norm 8.636910863376e-08<br>

>>>>>> 14 KSP Residual norm 2.761981175852e-08<br>

>>>>>> 15 KSP Residual norm 8.755459874369e-09<br>

>>>>>> 16 KSP Residual norm 2.708848598341e-09<br>

>>>>>> 17 KSP Residual norm 8.968748876265e-10<br>

>>>>>> KSP Object: 2048 MPI processes<br>

>>>>>> type: gmres<br>

>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt<br>

>>>>>> Orthogonalization with no iterative refinement<br>

>>>>>> GMRES: happy breakdown tolerance 1e-30<br>

>>>>>> maximum iterations=200, initial guess is zero<br>

>>>>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000<br>

>>>>>> left preconditioning<br>

>>>>>> using PRECONDITIONED norm type for convergence test<br>

>>>>>> PC Object: 2048 MPI processes<br>

>>>>>> type: ml<br>

>>>>>> MG: type is MULTIPLICATIVE, levels=3 cycles=v<br>

>>>>>>   Cycles per PCApply=1<br>

>>>>>>   Using Galerkin computed coarse grid matrices<br>

>>>>>> Coarse grid solver -- level -------------------------------<br>

>>>>>> KSP Object:    (mg_coarse_)     2048 MPI processes<br>

>>>>>>   type: preonly<br>

>>>>>>   maximum iterations=1, initial guess is zero<br>

>>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000<br>

>>>>>>   left preconditioning<br>

>>>>>>   using NONE norm type for convergence test<br>

>>>>>> PC Object:    (mg_coarse_)     2048 MPI processes<br>

>>>>>>   type: redundant<br>

>>>>>>     Redundant preconditioner: First (color=0) of 2048 PCs follows<br>

>>>>>>   KSP Object:      (mg_coarse_redundant_)       1 MPI processes<br>

>>>>>>     type: preonly<br>

>>>>>>     maximum iterations=10000, initial guess is zero<br>

>>>>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000<br>

>>>>>>     left preconditioning<br>

>>>>>>     using NONE norm type for convergence test<br>

>>>>>>   PC Object:      (mg_coarse_redundant_)       1 MPI processes<br>

>>>>>>     type: lu<br>

>>>>>>       LU: out-of-place factorization<br>

>>>>>>       tolerance for zero pivot 2.22045e-14<br>

>>>>>>       using diagonal shift on blocks to prevent zero pivot<br>

>>>>>>       matrix ordering: nd<br>

>>>>>>       factor fill ratio given 5, needed 4.38504<br>

>>>>>>         Factored matrix follows:<br>

>>>>>>           Matrix Object:               1 MPI processes<br>

>>>>>>             type: seqaij<br>

>>>>>>             rows=2055, cols=2055<br>

>>>>>>             package used to perform factorization: petsc<br>

>>>>>>             total: nonzeros=2476747, allocated nonzeros=2476747<br>

>>>>>>             total number of mallocs used during MatSetValues calls<br>

>>>>>> =0<br>

>>>>>>               using I-node routines: found 1638 nodes, limit used is<br>

>>>>>> 5<br>

>>>>>>     linear system matrix = precond matrix:<br>

>>>>>>     Matrix Object:         1 MPI processes<br>

>>>>>>       type: seqaij<br>

>>>>>>       rows=2055, cols=2055<br>

>>>>>>       total: nonzeros=564817, allocated nonzeros=1093260<br>

>>>>>>       total number of mallocs used during MatSetValues calls =0<br>

>>>>>>         not using I-node routines<br>

>>>>>>   linear system matrix = precond matrix:<br>

>>>>>>   Matrix Object:       2048 MPI processes<br>

>>>>>>     type: mpiaij<br>

>>>>>>     rows=2055, cols=2055<br>

>>>>>>     total: nonzeros=564817, allocated nonzeros=564817<br>

>>>>>>     total number of mallocs used during MatSetValues calls =0<br>

>>>>>>       not using I-node (on process 0) routines<br>

>>>>>> Down solver (pre-smoother) on level 1<br>

>>>>>> -------------------------------<br>

>>>>>> KSP Object:    (mg_levels_1_)     2048 MPI processes<br>

>>>>>>   type: richardson<br>

>>>>>>     Richardson: damping factor=1<br>

>>>>>>   maximum iterations=2<br>

>>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000<br>

>>>>>>   left preconditioning<br>

>>>>>>   using nonzero initial guess<br>

>>>>>>   using NONE norm type for convergence test<br>

>>>>>> PC Object:    (mg_levels_1_)     2048 MPI processes<br>

>>>>>>   type: sor<br>

>>>>>>     SOR: type = local_symmetric, iterations = 1, local iterations =<br>

>>>>>> 1,<br>

>>>>>> omega = 1<br>

>>>>>>   linear system matrix = precond matrix:<br>

>>>>>>   Matrix Object:       2048 MPI processes<br>

>>>>>>     type: mpiaij<br>

>>>>>>     rows=30194, cols=30194<br>

>>>>>>     total: nonzeros=3368414, allocated nonzeros=3368414<br>

>>>>>>     total number of mallocs used during MatSetValues calls =0<br>

>>>>>>       not using I-node (on process 0) routines<br>

>>>>>> Up solver (post-smoother) same as down solver (pre-smoother)<br>

>>>>>> Down solver (pre-smoother) on level 2<br>

>>>>>> -------------------------------<br>

>>>>>> KSP Object:    (mg_levels_2_)     2048 MPI processes<br>

>>>>>>   type: richardson<br>

>>>>>>     Richardson: damping factor=1<br>

>>>>>>   maximum iterations=2<br>

>>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000<br>

>>>>>>   left preconditioning<br>

>>>>>>   using nonzero initial guess<br>

>>>>>>   using NONE norm type for convergence test<br>

>>>>>> PC Object:    (mg_levels_2_)     2048 MPI processes<br>

>>>>>>   type: sor<br>

>>>>>>     SOR: type = local_symmetric, iterations = 1, local iterations =<br>

>>>>>> 1,<br>

>>>>>> omega = 1<br>

>>>>>>   linear system matrix = precond matrix:<br>

>>>>>>   Matrix Object:       2048 MPI processes<br>

>>>>>>     type: mpiaij<br>

>>>>>>     rows=531441, cols=531441<br>

>>>>>>     total: nonzeros=12476324, allocated nonzeros=12476324<br>

>>>>>>     total number of mallocs used during MatSetValues calls =0<br>

>>>>>>       not using I-node (on process 0) routines<br>

>>>>>> Up solver (post-smoother) same as down solver (pre-smoother)<br>

>>>>>> linear system matrix = precond matrix:<br>

>>>>>> Matrix Object:   2048 MPI processes<br>

>>>>>> type: mpiaij<br>

>>>>>> rows=531441, cols=531441<br>

>>>>>> total: nonzeros=12476324, allocated nonzeros=12476324<br>

>>>>>> total number of mallocs used during MatSetValues calls =0<br>

>>>>>>   not using I-node (on process 0) routines<br>

>>>>>> --- system solved with PETSc (in 2.407844e+02 seconds)<br>

>>>>>><br>

>>>>>><br>

>>>>><br>

>>>>><br>

>>>> <log-GAMG><log-ML><log-BoomerAMG><br>

>>><br>

>><br>

><br>

></blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener

</div></div>

</blockquote></div><br></body></html>