<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 12, 2017 at 7:04 PM, Kong, Fande <span dir="ltr"><<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <span dir="ltr"><<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>></span> wrote:<br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">You seem to have two levels here and 3M eqs on the fine grid and 37 on<br>
the coarse grid. I don't understand that.<br>
<br>
You are also calling the AMG setup a lot, but not spending much time<br>
in it. Try running with -info and grep on "GAMG".<br></blockquote><div><br></div></span><div>I got the following output:<br><br>[0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1, nnz/row (ave)=71, np=384<br>[0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold 0., 73.6364 nnz ave. (N=3020875)<br>[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square<br>[0] PCGAMGProlongator_AGG(): New grid 18162 nodes<br>[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978702e+00 min=2.559747e-02 PC=jacobi<br>[0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384, neq(loc)=40<br>[0] PCSetUp_GAMG(): 1) N=18162, n data cols=1, nnz/row (ave)=94, 384 active pes<br>[0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00795<br>[0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1, nnz/row (ave)=71, np=384<br>[0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold 0., 73.6364 nnz ave. (N=3020875)<br>[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square<br>[0] PCGAMGProlongator_AGG(): New grid 18145 nodes<br>[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978584e+00 min=2.557887e-02 PC=jacobi<br>[0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384, neq(loc)=37<br>[0] PCSetUp_GAMG(): 1) N=18145, n data cols=1, nnz/row (ave)=94, 384 active pes<br></div></div></div></div></blockquote><div><br></div><div>You are still doing two levels. Just use the parameters that I told you and you should see that 1) this coarsest (last) grid has "1 active pes" and 2) the overall solve time and overall convergence rate is much better.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>[0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00792<br> GAMG specific options<br>PCGAMGGraph_AGG 40 1.0 8.0759e+00 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02 2 0 2 4 2 2 0 2 4 2 1170<br>PCGAMGCoarse_AGG 40 1.0 7.1698e+01 1.0 4.05e+09 2.3 4.0e+06 5.1e+04 1.2e+03 18 37 5 27 3 18 37 5 27 3 14632<br>PCGAMGProl_AGG 40 1.0 9.2650e-01 1.2 0.00e+00 0.0 9.8e+05 2.9e+03 9.6e+02 0 0 1 0 2 0 0 1 0 2 0<br>PCGAMGPOpt_AGG 40 1.0 2.4484e+00 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03 1 4 4 1 4 1 4 4 1 4 51328<br>GAMG: createProl 40 1.0 8.3786e+01 1.0 4.56e+09 2.3 9.6e+06 2.5e+04 4.8e+03 21 42 12 32 10 21 42 12 32 10 14134<br>GAMG: partLevel 40 1.0 6.7755e+00 1.1 2.59e+08 2.3 2.9e+06 2.5e+03 1.5e+03 2 2 4 1 3 2 2 4 1 3 9431<br></div><div><div class="h5"><div><br><br><br></div><div><br><br><br> </div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">
<div class="m_-7331534077537607382gmail-HOEnZb"><div class="m_-7331534077537607382gmail-h5"><br>
<br>
On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>
> Thanks, Barry.<br>
><br>
> It works.<br>
><br>
> GAMG is three times better than ASM in terms of the number of linear<br>
> iterations, but it is five times slower than ASM. Any suggestions to improve<br>
> the performance of GAMG? Log files are attached.<br>
><br>
> Fande,<br>
><br>
> On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
>><br>
>><br>
>> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>
>> ><br>
>> > Thanks, Mark and Barry,<br>
>> ><br>
>> > It works pretty wells in terms of the number of linear iterations (using<br>
>> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am<br>
>> > using the two-level method via "-pc_mg_levels 2". The reason why the compute<br>
>> > time is larger than other preconditioning options is that a matrix free<br>
>> > method is used in the fine level and in my particular problem the function<br>
>> > evaluation is expensive.<br>
>> ><br>
>> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,<br>
>> > but I do not think I want to make the preconditioning part matrix-free. Do<br>
>> > you guys know how to turn off the matrix-free method for GAMG?<br>
>><br>
>> -pc_use_amat false<br>
>><br>
>> ><br>
>> > Here is the detailed solver:<br>
>> ><br>
>> > SNES Object: 384 MPI processes<br>
>> > type: newtonls<br>
>> > maximum iterations=200, maximum function evaluations=10000<br>
>> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50<br>
>> > total number of linear solver iterations=20<br>
>> > total number of function evaluations=166<br>
>> > norm schedule ALWAYS<br>
>> > SNESLineSearch Object: 384 MPI processes<br>
>> > type: bt<br>
>> > interpolation: cubic<br>
>> > alpha=1.000000e-04<br>
>> > maxstep=1.000000e+08, minlambda=1.000000e-12<br>
>> > tolerances: relative=1.000000e-08, absolute=1.000000e-15,<br>
>> > lambda=1.000000e-08<br>
>> > maximum iterations=40<br>
>> > KSP Object: 384 MPI processes<br>
>> > type: gmres<br>
>> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt<br>
>> > Orthogonalization with no iterative refinement<br>
>> > GMRES: happy breakdown tolerance 1e-30<br>
>> > maximum iterations=100, initial guess is zero<br>
>> > tolerances: relative=0.001, absolute=1e-50, divergence=10000.<br>
>> > right preconditioning<br>
>> > using UNPRECONDITIONED norm type for convergence test<br>
>> > PC Object: 384 MPI processes<br>
>> > type: gamg<br>
>> > MG: type is MULTIPLICATIVE, levels=2 cycles=v<br>
>> > Cycles per PCApply=1<br>
>> > Using Galerkin computed coarse grid matrices<br>
>> > GAMG specific options<br>
>> > Threshold for dropping small values from graph 0.<br>
>> > AGG specific options<br>
>> > Symmetric graph true<br>
>> > Coarse grid solver -- level ------------------------------<wbr>-<br>
>> > KSP Object: (mg_coarse_) 384 MPI processes<br>
>> > type: preonly<br>
>> > maximum iterations=10000, initial guess is zero<br>
>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br>
>> > left preconditioning<br>
>> > using NONE norm type for convergence test<br>
>> > PC Object: (mg_coarse_) 384 MPI processes<br>
>> > type: bjacobi<br>
>> > block Jacobi: number of blocks = 384<br>
>> > Local solve is same for all blocks, in the following KSP and<br>
>> > PC objects:<br>
>> > KSP Object: (mg_coarse_sub_) 1 MPI processes<br>
>> > type: preonly<br>
>> > maximum iterations=1, initial guess is zero<br>
>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br>
>> > left preconditioning<br>
>> > using NONE norm type for convergence test<br>
>> > PC Object: (mg_coarse_sub_) 1 MPI processes<br>
>> > type: lu<br>
>> > LU: out-of-place factorization<br>
>> > tolerance for zero pivot 2.22045e-14<br>
>> > using diagonal shift on blocks to prevent zero pivot<br>
>> > [INBLOCKS]<br>
>> > matrix ordering: nd<br>
>> > factor fill ratio given 5., needed 1.31367<br>
>> > Factored matrix follows:<br>
>> > Mat Object: 1 MPI processes<br>
>> > type: seqaij<br>
>> > rows=37, cols=37<br>
>> > package used to perform factorization: petsc<br>
>> > total: nonzeros=913, allocated nonzeros=913<br>
>> > total number of mallocs used during MatSetValues calls<br>
>> > =0<br>
>> > not using I-node routines<br>
>> > linear system matrix = precond matrix:<br>
>> > Mat Object: 1 MPI processes<br>
>> > type: seqaij<br>
>> > rows=37, cols=37<br>
>> > total: nonzeros=695, allocated nonzeros=695<br>
>> > total number of mallocs used during MatSetValues calls =0<br>
>> > not using I-node routines<br>
>> > linear system matrix = precond matrix:<br>
>> > Mat Object: 384 MPI processes<br>
>> > type: mpiaij<br>
>> > rows=18145, cols=18145<br>
>> > total: nonzeros=1709115, allocated nonzeros=1709115<br>
>> > total number of mallocs used during MatSetValues calls =0<br>
>> > not using I-node (on process 0) routines<br>
>> > Down solver (pre-smoother) on level 1<br>
>> > ------------------------------<wbr>-<br>
>> > KSP Object: (mg_levels_1_) 384 MPI processes<br>
>> > type: chebyshev<br>
>> > Chebyshev: eigenvalue estimates: min = 0.133339, max =<br>
>> > 1.46673<br>
>> > Chebyshev: eigenvalues estimated using gmres with translations<br>
>> > [0. 0.1; 0. 1.1]<br>
>> > KSP Object: (mg_levels_1_esteig_) 384 MPI<br>
>> > processes<br>
>> > type: gmres<br>
>> > GMRES: restart=30, using Classical (unmodified)<br>
>> > Gram-Schmidt Orthogonalization with no iterative refinement<br>
>> > GMRES: happy breakdown tolerance 1e-30<br>
>> > maximum iterations=10, initial guess is zero<br>
>> > tolerances: relative=1e-12, absolute=1e-50,<br>
>> > divergence=10000.<br>
>> > left preconditioning<br>
>> > using PRECONDITIONED norm type for convergence test<br>
>> > maximum iterations=2<br>
>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br>
>> > left preconditioning<br>
>> > using nonzero initial guess<br>
>> > using NONE norm type for convergence test<br>
>> > PC Object: (mg_levels_1_) 384 MPI processes<br>
>> > type: sor<br>
>> > SOR: type = local_symmetric, iterations = 1, local iterations<br>
>> > = 1, omega = 1.<br>
>> > linear system matrix followed by preconditioner matrix:<br>
>> > Mat Object: 384 MPI processes<br>
>> > type: mffd<br>
>> > rows=3020875, cols=3020875<br>
>> > Matrix-free approximation:<br>
>> > err=1.49012e-08 (relative error in function evaluation)<br>
>> > Using wp compute h routine<br>
>> > Does not compute normU<br>
>> > Mat Object: () 384 MPI processes<br>
>> > type: mpiaij<br>
>> > rows=3020875, cols=3020875<br>
>> > total: nonzeros=215671710, allocated nonzeros=241731750<br>
>> > total number of mallocs used during MatSetValues calls =0<br>
>> > not using I-node (on process 0) routines<br>
>> > Up solver (post-smoother) same as down solver (pre-smoother)<br>
>> > linear system matrix followed by preconditioner matrix:<br>
>> > Mat Object: 384 MPI processes<br>
>> > type: mffd<br>
>> > rows=3020875, cols=3020875<br>
>> > Matrix-free approximation:<br>
>> > err=1.49012e-08 (relative error in function evaluation)<br>
>> > Using wp compute h routine<br>
>> > Does not compute normU<br>
>> > Mat Object: () 384 MPI processes<br>
>> > type: mpiaij<br>
>> > rows=3020875, cols=3020875<br>
>> > total: nonzeros=215671710, allocated nonzeros=241731750<br>
>> > total number of mallocs used during MatSetValues calls =0<br>
>> > not using I-node (on process 0) routines<br>
>> ><br>
>> ><br>
>> > Fande,<br>
>> ><br>
>> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>
>> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
>> > ><br>
>> > >> Does this mean that GAMG works for the symmetrical matrix only?<br>
>> > ><br>
>> > > No, it means that for non symmetric nonzero structure you need the<br>
>> > > extra flag. So use the extra flag. The reason we don't always use the flag<br>
>> > > is because it adds extra cost and isn't needed if the matrix already has a<br>
>> > > symmetric nonzero structure.<br>
>> ><br>
>> > BTW, if you have symmetric non-zero structure you can just set<br>
>> > -pc_gamg_threshold -1.0', note the "or" in the message.<br>
>> ><br>
>> > If you want to mess with the threshold then you need to use the<br>
>> > symmetrized flag.<br>
>> ><br>
>><br>
><br>
</div></div></blockquote></div></div></div><br></div></div>
</blockquote></div><br></div></div>