<div dir="ltr"><div>Hi Mark,<br><br></div>Thanks for your reply.<br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 12, 2017 at 9:16 AM, Mark Adams <span dir="ltr"><<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The problem comes from setting the number of MG levels (<span style="color:rgb(0,0,0);font-family:"courier new",courier,monospace,arial,sans-serif;font-size:14px;white-space:pre-wrap">-pc_mg_levels 2</span>). Not your fault, it looks like the GAMG logic is faulty, in your version at least.</div></blockquote><div><br></div><div>What I want is that GAMG coarsens the fine matrix once and then stops doing anything.  I did not see any benefits to have more levels if the number of processors is small. <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>  <br><div>GAMG will force the coarsest grid to one processor by default, in newer versions. You can override the default with:<div><br></div><div>-pc_gamg_use_parallel_coarse_<wbr>grid_solver</div><div><br></div><div>Your coarse grid solver is ASM with these 37 equation per process and 512 processes. That is bad. </div></div></div></div></blockquote><div><br></div><div>Why this is bad? The subdomain problem is too small? <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Note, you could run this on one process to see the proper convergence rate.  </div></div></div></div></blockquote><div><br></div><div>Convergence rate for which part? coarse solver, subdomain solver?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>You can fix this with parameters:<br><div><br></div><div><div>>   -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations per process on coarse grids (PCGAMGSetProcEqLim)</div><div>>   -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the coarse grid (PCGAMGSetCoarseEqLim)</div></div><div><br></div></div></div></div><div>If you really want two levels then set something like -pc_gamg_coarse_eq_limit <span style="font-size:12.8px">18145 (or higher) </span>-pc_gamg_coarse_eq_<wbr>limit <span style="font-size:12.8px">18145 (or higher). </span></div></div></blockquote><div><br><br></div><div>May have something like: make the coarse problem 1/8 large as the original problem? Otherwise, this number is just problem dependent. <br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><span style="font-size:12.8px">You can run with -info and grep on GAMG and you will meta-data for each level. you should see "npe=1" for the coarsest, last, grid. Or use a parallel direct solver.</span></div></div></blockquote><div><br></div><div>I will try. <br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Note, you should not see much degradation as you increase the number of levels. 18145 eqs on a 3D problem will probably be noticeable. I generally aim for about 3000.</span></div></div></blockquote><div><br></div><div>It should be fine as long as the coarse problem is solved by a parallel solver. <br><br><br></div><div>Fande,<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande <span dir="ltr"><<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <span dir="ltr"><<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>></span> wrote:<br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">You seem to have two levels here and 3M eqs on the fine grid and 37 on<br>

the coarse grid.</blockquote><div><br></div></span><div>37 is on the sub domain.<br><br></div><div> rows=18145, cols=18145 on the entire coarse grid. <br><br></div><div><div class="m_-6187618188187603354h5"><div> <br></div><div><br> </div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"> I don't understand that.<br>

<br>

You are also calling the AMG setup a lot, but not spending much time<br>

in it. Try running with -info and grep on "GAMG".<br>

<div class="m_-6187618188187603354m_6078227717222367195gmail-HOEnZb"><div class="m_-6187618188187603354m_6078227717222367195gmail-h5"><br>

<br>

On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>

> Thanks, Barry.<br>

><br>

> It works.<br>

><br>

> GAMG is three times better than ASM in terms of the number of linear<br>

> iterations, but it is five times slower than ASM. Any suggestions to improve<br>

> the performance of GAMG? Log files are attached.<br>

><br>

> Fande,<br>

><br>

> On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>

>><br>

>><br>

>> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>

>> ><br>

>> > Thanks, Mark and Barry,<br>

>> ><br>

>> > It works pretty wells in terms of the number of linear iterations (using<br>

>> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am<br>

>> > using the two-level method via "-pc_mg_levels 2". The reason why the compute<br>

>> > time is larger than other preconditioning options is that a matrix free<br>

>> > method is used in the fine level and in my particular problem the function<br>

>> > evaluation is expensive.<br>

>> ><br>

>> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,<br>

>> > but I do not think I want to make the preconditioning part matrix-free.  Do<br>

>> > you guys know how to turn off the matrix-free method for GAMG?<br>

>><br>

>>    -pc_use_amat false<br>

>><br>

>> ><br>

>> > Here is the detailed solver:<br>

>> ><br>

>> > SNES Object: 384 MPI processes<br>

>> >   type: newtonls<br>

>> >   maximum iterations=200, maximum function evaluations=10000<br>

>> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50<br>

>> >   total number of linear solver iterations=20<br>

>> >   total number of function evaluations=166<br>

>> >   norm schedule ALWAYS<br>

>> >   SNESLineSearch Object:   384 MPI processes<br>

>> >     type: bt<br>

>> >       interpolation: cubic<br>

>> >       alpha=1.000000e-04<br>

>> >     maxstep=1.000000e+08, minlambda=1.000000e-12<br>

>> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,<br>

>> > lambda=1.000000e-08<br>

>> >     maximum iterations=40<br>

>> >   KSP Object:   384 MPI processes<br>

>> >     type: gmres<br>

>> >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt<br>

>> > Orthogonalization with no iterative refinement<br>

>> >       GMRES: happy breakdown tolerance 1e-30<br>

>> >     maximum iterations=100, initial guess is zero<br>

>> >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.<br>

>> >     right preconditioning<br>

>> >     using UNPRECONDITIONED norm type for convergence test<br>

>> >   PC Object:   384 MPI processes<br>

>> >     type: gamg<br>

>> >       MG: type is MULTIPLICATIVE, levels=2 cycles=v<br>

>> >         Cycles per PCApply=1<br>

>> >         Using Galerkin computed coarse grid matrices<br>

>> >         GAMG specific options<br>

>> >           Threshold for dropping small values from graph 0.<br>

>> >           AGG specific options<br>

>> >             Symmetric graph true<br>

>> >     Coarse grid solver -- level ------------------------------<wbr>-<br>

>> >       KSP Object:      (mg_coarse_)       384 MPI processes<br>

>> >         type: preonly<br>

>> >         maximum iterations=10000, initial guess is zero<br>

>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>> >         left preconditioning<br>

>> >         using NONE norm type for convergence test<br>

>> >       PC Object:      (mg_coarse_)       384 MPI processes<br>

>> >         type: bjacobi<br>

>> >           block Jacobi: number of blocks = 384<br>

>> >           Local solve is same for all blocks, in the following KSP and<br>

>> > PC objects:<br>

>> >         KSP Object:        (mg_coarse_sub_)         1 MPI processes<br>

>> >           type: preonly<br>

>> >           maximum iterations=1, initial guess is zero<br>

>> >           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>> >           left preconditioning<br>

>> >           using NONE norm type for convergence test<br>

>> >         PC Object:        (mg_coarse_sub_)         1 MPI processes<br>

>> >           type: lu<br>

>> >             LU: out-of-place factorization<br>

>> >             tolerance for zero pivot 2.22045e-14<br>

>> >             using diagonal shift on blocks to prevent zero pivot<br>

>> > [INBLOCKS]<br>

>> >             matrix ordering: nd<br>

>> >             factor fill ratio given 5., needed 1.31367<br>

>> >               Factored matrix follows:<br>

>> >                 Mat Object:                 1 MPI processes<br>

>> >                   type: seqaij<br>

>> >                   rows=37, cols=37<br>

>> >                   package used to perform factorization: petsc<br>

>> >                   total: nonzeros=913, allocated nonzeros=913<br>

>> >                   total number of mallocs used during MatSetValues calls<br>

>> > =0<br>

>> >                     not using I-node routines<br>

>> >           linear system matrix = precond matrix:<br>

>> >           Mat Object:           1 MPI processes<br>

>> >             type: seqaij<br>

>> >             rows=37, cols=37<br>

>> >             total: nonzeros=695, allocated nonzeros=695<br>

>> >             total number of mallocs used during MatSetValues calls =0<br>

>> >               not using I-node routines<br>

>> >         linear system matrix = precond matrix:<br>

>> >         Mat Object:         384 MPI processes<br>

>> >           type: mpiaij<br>

>> >           rows=18145, cols=18145<br>

>> >           total: nonzeros=1709115, allocated nonzeros=1709115<br>

>> >           total number of mallocs used during MatSetValues calls =0<br>

>> >             not using I-node (on process 0) routines<br>

>> >     Down solver (pre-smoother) on level 1<br>

>> > ------------------------------<wbr>-<br>

>> >       KSP Object:      (mg_levels_1_)       384 MPI processes<br>

>> >         type: chebyshev<br>

>> >           Chebyshev: eigenvalue estimates:  min = 0.133339, max =<br>

>> > 1.46673<br>

>> >           Chebyshev: eigenvalues estimated using gmres with translations<br>

>> > [0. 0.1; 0. 1.1]<br>

>> >           KSP Object:          (mg_levels_1_esteig_)           384 MPI<br>

>> > processes<br>

>> >             type: gmres<br>

>> >               GMRES: restart=30, using Classical (unmodified)<br>

>> > Gram-Schmidt Orthogonalization with no iterative refinement<br>

>> >               GMRES: happy breakdown tolerance 1e-30<br>

>> >             maximum iterations=10, initial guess is zero<br>

>> >             tolerances:  relative=1e-12, absolute=1e-50,<br>

>> > divergence=10000.<br>

>> >             left preconditioning<br>

>> >             using PRECONDITIONED norm type for convergence test<br>

>> >         maximum iterations=2<br>

>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>> >         left preconditioning<br>

>> >         using nonzero initial guess<br>

>> >         using NONE norm type for convergence test<br>

>> >       PC Object:      (mg_levels_1_)       384 MPI processes<br>

>> >         type: sor<br>

>> >           SOR: type = local_symmetric, iterations = 1, local iterations<br>

>> > = 1, omega = 1.<br>

>> >         linear system matrix followed by preconditioner matrix:<br>

>> >         Mat Object:         384 MPI processes<br>

>> >           type: mffd<br>

>> >           rows=3020875, cols=3020875<br>

>> >             Matrix-free approximation:<br>

>> >               err=1.49012e-08 (relative error in function evaluation)<br>

>> >               Using wp compute h routine<br>

>> >                   Does not compute normU<br>

>> >         Mat Object:        ()         384 MPI processes<br>

>> >           type: mpiaij<br>

>> >           rows=3020875, cols=3020875<br>

>> >           total: nonzeros=215671710, allocated nonzeros=241731750<br>

>> >           total number of mallocs used during MatSetValues calls =0<br>

>> >             not using I-node (on process 0) routines<br>

>> >     Up solver (post-smoother) same as down solver (pre-smoother)<br>

>> >     linear system matrix followed by preconditioner matrix:<br>

>> >     Mat Object:     384 MPI processes<br>

>> >       type: mffd<br>

>> >       rows=3020875, cols=3020875<br>

>> >         Matrix-free approximation:<br>

>> >           err=1.49012e-08 (relative error in function evaluation)<br>

>> >           Using wp compute h routine<br>

>> >               Does not compute normU<br>

>> >     Mat Object:    ()     384 MPI processes<br>

>> >       type: mpiaij<br>

>> >       rows=3020875, cols=3020875<br>

>> >       total: nonzeros=215671710, allocated nonzeros=241731750<br>

>> >       total number of mallocs used during MatSetValues calls =0<br>

>> >         not using I-node (on process 0) routines<br>

>> ><br>

>> ><br>

>> > Fande,<br>

>> ><br>

>> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

>> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>

>> > ><br>

>> > >> Does this mean that GAMG works for the symmetrical matrix only?<br>

>> > ><br>

>> > >   No, it means that for non symmetric nonzero structure you need the<br>

>> > > extra flag. So use the extra flag. The reason we don't always use the flag<br>

>> > > is because it adds extra cost and isn't needed if the matrix already has a<br>

>> > > symmetric nonzero structure.<br>

>> ><br>

>> > BTW, if you have symmetric non-zero structure you can just set<br>

>> > -pc_gamg_threshold -1.0', note the "or" in the message.<br>

>> ><br>

>> > If you want to mess with the threshold then you need to use the<br>

>> > symmetrized flag.<br>

>> ><br>

>><br>

><br>

</div></div></blockquote></div></div></div><br></div></div>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div></div>