<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 12, 2017 at 7:04 PM, Kong, Fande <span dir="ltr"><<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <span dir="ltr"><<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>></span> wrote:<br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">You seem to have two levels here and 3M eqs on the fine grid and 37 on<br>

the coarse grid. I don't understand that.<br>

<br>

You are also calling the AMG setup a lot, but not spending much time<br>

in it. Try running with -info and grep on "GAMG".<br></blockquote><div><br></div></span><div>I got the following output:<br><br>[0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1, nnz/row (ave)=71, np=384<br>[0] PCGAMGFilterGraph():      100.% nnz after filtering, with threshold 0., 73.6364 nnz ave. (N=3020875)<br>[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square<br>[0] PCGAMGProlongator_AGG(): New grid 18162 nodes<br>[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978702e+00 min=2.559747e-02 PC=jacobi<br>[0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384, neq(loc)=40<br>[0] PCSetUp_GAMG(): 1) N=18162, n data cols=1, nnz/row (ave)=94, 384 active pes<br>[0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00795<br>[0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1, nnz/row (ave)=71, np=384<br>[0] PCGAMGFilterGraph():      100.% nnz after filtering, with threshold 0., 73.6364 nnz ave. (N=3020875)<br>[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square<br>[0] PCGAMGProlongator_AGG(): New grid 18145 nodes<br>[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978584e+00 min=2.557887e-02 PC=jacobi<br>[0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384, neq(loc)=37<br>[0] PCSetUp_GAMG(): 1) N=18145, n data cols=1, nnz/row (ave)=94, 384 active pes<br></div></div></div></div></blockquote><div><br></div><div>You are still doing two levels. Just use the parameters that I told you and you should see that 1) this coarsest (last) grid has "1 active pes" and 2) the overall solve time and overall convergence rate is much better.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>[0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00792<br>        GAMG specific options<br>PCGAMGGraph_AGG       40 1.0 8.0759e+00 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02  2  0  2  4  2   2  0  2  4  2  1170<br>PCGAMGCoarse_AGG      40 1.0 7.1698e+01 1.0 4.05e+09 2.3 4.0e+06 5.1e+04 1.2e+03 18 37  5 27  3  18 37  5 27  3 14632<br>PCGAMGProl_AGG        40 1.0 9.2650e-01 1.2 0.00e+00 0.0 9.8e+05 2.9e+03 9.6e+02  0  0  1  0  2   0  0  1  0  2     0<br>PCGAMGPOpt_AGG        40 1.0 2.4484e+00 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03  1  4  4  1  4   1  4  4  1  4 51328<br>GAMG: createProl      40 1.0 8.3786e+01 1.0 4.56e+09 2.3 9.6e+06 2.5e+04 4.8e+03 21 42 12 32 10  21 42 12 32 10 14134<br>GAMG: partLevel       40 1.0 6.7755e+00 1.1 2.59e+08 2.3 2.9e+06 2.5e+03 1.5e+03  2  2  4  1  3   2  2  4  1  3  9431<br></div><div><div class="h5"><div><br><br><br></div><div><br><br><br> </div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">

<div class="m_-7331534077537607382gmail-HOEnZb"><div class="m_-7331534077537607382gmail-h5"><br>

<br>

On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>

> Thanks, Barry.<br>

><br>

> It works.<br>

><br>

> GAMG is three times better than ASM in terms of the number of linear<br>

> iterations, but it is five times slower than ASM. Any suggestions to improve<br>

> the performance of GAMG? Log files are attached.<br>

><br>

> Fande,<br>

><br>

> On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>

>><br>

>><br>

>> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>

>> ><br>

>> > Thanks, Mark and Barry,<br>

>> ><br>

>> > It works pretty wells in terms of the number of linear iterations (using<br>

>> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am<br>

>> > using the two-level method via "-pc_mg_levels 2". The reason why the compute<br>

>> > time is larger than other preconditioning options is that a matrix free<br>

>> > method is used in the fine level and in my particular problem the function<br>

>> > evaluation is expensive.<br>

>> ><br>

>> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,<br>

>> > but I do not think I want to make the preconditioning part matrix-free.  Do<br>

>> > you guys know how to turn off the matrix-free method for GAMG?<br>

>><br>

>>    -pc_use_amat false<br>

>><br>

>> ><br>

>> > Here is the detailed solver:<br>

>> ><br>

>> > SNES Object: 384 MPI processes<br>

>> >   type: newtonls<br>

>> >   maximum iterations=200, maximum function evaluations=10000<br>

>> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50<br>

>> >   total number of linear solver iterations=20<br>

>> >   total number of function evaluations=166<br>

>> >   norm schedule ALWAYS<br>

>> >   SNESLineSearch Object:   384 MPI processes<br>

>> >     type: bt<br>

>> >       interpolation: cubic<br>

>> >       alpha=1.000000e-04<br>

>> >     maxstep=1.000000e+08, minlambda=1.000000e-12<br>

>> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,<br>

>> > lambda=1.000000e-08<br>

>> >     maximum iterations=40<br>

>> >   KSP Object:   384 MPI processes<br>

>> >     type: gmres<br>

>> >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt<br>

>> > Orthogonalization with no iterative refinement<br>

>> >       GMRES: happy breakdown tolerance 1e-30<br>

>> >     maximum iterations=100, initial guess is zero<br>

>> >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.<br>

>> >     right preconditioning<br>

>> >     using UNPRECONDITIONED norm type for convergence test<br>

>> >   PC Object:   384 MPI processes<br>

>> >     type: gamg<br>

>> >       MG: type is MULTIPLICATIVE, levels=2 cycles=v<br>

>> >         Cycles per PCApply=1<br>

>> >         Using Galerkin computed coarse grid matrices<br>

>> >         GAMG specific options<br>

>> >           Threshold for dropping small values from graph 0.<br>

>> >           AGG specific options<br>

>> >             Symmetric graph true<br>

>> >     Coarse grid solver -- level ------------------------------<wbr>-<br>

>> >       KSP Object:      (mg_coarse_)       384 MPI processes<br>

>> >         type: preonly<br>

>> >         maximum iterations=10000, initial guess is zero<br>

>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>> >         left preconditioning<br>

>> >         using NONE norm type for convergence test<br>

>> >       PC Object:      (mg_coarse_)       384 MPI processes<br>

>> >         type: bjacobi<br>

>> >           block Jacobi: number of blocks = 384<br>

>> >           Local solve is same for all blocks, in the following KSP and<br>

>> > PC objects:<br>

>> >         KSP Object:        (mg_coarse_sub_)         1 MPI processes<br>

>> >           type: preonly<br>

>> >           maximum iterations=1, initial guess is zero<br>

>> >           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>> >           left preconditioning<br>

>> >           using NONE norm type for convergence test<br>

>> >         PC Object:        (mg_coarse_sub_)         1 MPI processes<br>

>> >           type: lu<br>

>> >             LU: out-of-place factorization<br>

>> >             tolerance for zero pivot 2.22045e-14<br>

>> >             using diagonal shift on blocks to prevent zero pivot<br>

>> > [INBLOCKS]<br>

>> >             matrix ordering: nd<br>

>> >             factor fill ratio given 5., needed 1.31367<br>

>> >               Factored matrix follows:<br>

>> >                 Mat Object:                 1 MPI processes<br>

>> >                   type: seqaij<br>

>> >                   rows=37, cols=37<br>

>> >                   package used to perform factorization: petsc<br>

>> >                   total: nonzeros=913, allocated nonzeros=913<br>

>> >                   total number of mallocs used during MatSetValues calls<br>

>> > =0<br>

>> >                     not using I-node routines<br>

>> >           linear system matrix = precond matrix:<br>

>> >           Mat Object:           1 MPI processes<br>

>> >             type: seqaij<br>

>> >             rows=37, cols=37<br>

>> >             total: nonzeros=695, allocated nonzeros=695<br>

>> >             total number of mallocs used during MatSetValues calls =0<br>

>> >               not using I-node routines<br>

>> >         linear system matrix = precond matrix:<br>

>> >         Mat Object:         384 MPI processes<br>

>> >           type: mpiaij<br>

>> >           rows=18145, cols=18145<br>

>> >           total: nonzeros=1709115, allocated nonzeros=1709115<br>

>> >           total number of mallocs used during MatSetValues calls =0<br>

>> >             not using I-node (on process 0) routines<br>

>> >     Down solver (pre-smoother) on level 1<br>

>> > ------------------------------<wbr>-<br>

>> >       KSP Object:      (mg_levels_1_)       384 MPI processes<br>

>> >         type: chebyshev<br>

>> >           Chebyshev: eigenvalue estimates:  min = 0.133339, max =<br>

>> > 1.46673<br>

>> >           Chebyshev: eigenvalues estimated using gmres with translations<br>

>> > [0. 0.1; 0. 1.1]<br>

>> >           KSP Object:          (mg_levels_1_esteig_)           384 MPI<br>

>> > processes<br>

>> >             type: gmres<br>

>> >               GMRES: restart=30, using Classical (unmodified)<br>

>> > Gram-Schmidt Orthogonalization with no iterative refinement<br>

>> >               GMRES: happy breakdown tolerance 1e-30<br>

>> >             maximum iterations=10, initial guess is zero<br>

>> >             tolerances:  relative=1e-12, absolute=1e-50,<br>

>> > divergence=10000.<br>

>> >             left preconditioning<br>

>> >             using PRECONDITIONED norm type for convergence test<br>

>> >         maximum iterations=2<br>

>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>> >         left preconditioning<br>

>> >         using nonzero initial guess<br>

>> >         using NONE norm type for convergence test<br>

>> >       PC Object:      (mg_levels_1_)       384 MPI processes<br>

>> >         type: sor<br>

>> >           SOR: type = local_symmetric, iterations = 1, local iterations<br>

>> > = 1, omega = 1.<br>

>> >         linear system matrix followed by preconditioner matrix:<br>

>> >         Mat Object:         384 MPI processes<br>

>> >           type: mffd<br>

>> >           rows=3020875, cols=3020875<br>

>> >             Matrix-free approximation:<br>

>> >               err=1.49012e-08 (relative error in function evaluation)<br>

>> >               Using wp compute h routine<br>

>> >                   Does not compute normU<br>

>> >         Mat Object:        ()         384 MPI processes<br>

>> >           type: mpiaij<br>

>> >           rows=3020875, cols=3020875<br>

>> >           total: nonzeros=215671710, allocated nonzeros=241731750<br>

>> >           total number of mallocs used during MatSetValues calls =0<br>

>> >             not using I-node (on process 0) routines<br>

>> >     Up solver (post-smoother) same as down solver (pre-smoother)<br>

>> >     linear system matrix followed by preconditioner matrix:<br>

>> >     Mat Object:     384 MPI processes<br>

>> >       type: mffd<br>

>> >       rows=3020875, cols=3020875<br>

>> >         Matrix-free approximation:<br>

>> >           err=1.49012e-08 (relative error in function evaluation)<br>

>> >           Using wp compute h routine<br>

>> >               Does not compute normU<br>

>> >     Mat Object:    ()     384 MPI processes<br>

>> >       type: mpiaij<br>

>> >       rows=3020875, cols=3020875<br>

>> >       total: nonzeros=215671710, allocated nonzeros=241731750<br>

>> >       total number of mallocs used during MatSetValues calls =0<br>

>> >         not using I-node (on process 0) routines<br>

>> ><br>

>> ><br>

>> > Fande,<br>

>> ><br>

>> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

>> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>

>> > ><br>

>> > >> Does this mean that GAMG works for the symmetrical matrix only?<br>

>> > ><br>

>> > >   No, it means that for non symmetric nonzero structure you need the<br>

>> > > extra flag. So use the extra flag. The reason we don't always use the flag<br>

>> > > is because it adds extra cost and isn't needed if the matrix already has a<br>

>> > > symmetric nonzero structure.<br>

>> ><br>

>> > BTW, if you have symmetric non-zero structure you can just set<br>

>> > -pc_gamg_threshold -1.0', note the "or" in the message.<br>

>> ><br>

>> > If you want to mess with the threshold then you need to use the<br>

>> > symmetrized flag.<br>

>> ><br>

>><br>

><br>

</div></div></blockquote></div></div></div><br></div></div>

</blockquote></div><br></div></div>