<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 12, 2017 at 1:31 PM, Kong, Fande <span dir="ltr"><<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi Mark,<br><br></div>Thanks for your reply.<br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Wed, Apr 12, 2017 at 9:16 AM, Mark Adams <span dir="ltr"><<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The problem comes from setting the number of MG levels (<span style="color:rgb(0,0,0);font-family:"courier new",courier,monospace,arial,sans-serif;font-size:14px;white-space:pre-wrap">-pc_mg_levels 2</span>). Not your fault, it looks like the GAMG logic is faulty, in your version at least.</div></blockquote><div><br></div></span><div>What I want is that GAMG coarsens the fine matrix once and then stops doing anything.  I did not see any benefits to have more levels if the number of processors is small. <br></div></div></div></div></blockquote><div><br></div><div>The number of levels is a math issue and has nothing to do with parallelism. If you do just one level your coarse grid is very large and expensive to solve, so you want to keep coarsening. There is rarely a need to use -pc_mg_levels</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>  <br><div>GAMG will force the coarsest grid to one processor by default, in newer versions. You can override the default with:<div><br></div><div>-pc_gamg_use_parallel_coarse_g<wbr>rid_solver</div><div><br></div><div>Your coarse grid solver is ASM with these 37 equation per process and 512 processes. That is bad. </div></div></div></div></blockquote><div><br></div></span><div>Why this is bad? The subdomain problem is too small? <br></div></div></div></div></blockquote><div><br></div><div>Because ASM with 512 blocks is a weak solver. You want the coarse grid to be solved exactly. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Note, you could run this on one process to see the proper convergence rate.  </div></div></div></div></blockquote><div><br></div></span><div>Convergence rate for which part? coarse solver, subdomain solver?<br></div></div></div></div></blockquote><div><br></div><div>The overall converge rate.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>You can fix this with parameters:<br><div><br></div><div><div>>   -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations per process on coarse grids (PCGAMGSetProcEqLim)</div><div>>   -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the coarse grid (PCGAMGSetCoarseEqLim)</div></div><div><br></div></div></div></div><div>If you really want two levels then set something like -pc_gamg_coarse_eq_limit <span style="font-size:12.8px">18145 (or higher) </span>-pc_gamg_coarse_eq_lim<wbr>it <span style="font-size:12.8px">18145 (or higher). </span></div></div></blockquote><div><br><br></div></span><div>May have something like: make the coarse problem 1/8 large as the original problem? Otherwise, this number is just problem dependent. <br></div></div></div></div></blockquote><div><br></div><div>GAMG will stop automatically so that you do not need problem dependant parameters.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><span class=""><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><span style="font-size:12.8px">You can run with -info and grep on GAMG and you will meta-data for each level. you should see "npe=1" for the coarsest, last, grid. Or use a parallel direct solver.</span></div></div></blockquote><div><br></div></span><div>I will try. <br> <br></div><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Note, you should not see much degradation as you increase the number of levels. 18145 eqs on a 3D problem will probably be noticeable. I generally aim for about 3000.</span></div></div></blockquote><div><br></div></span><div>It should be fine as long as the coarse problem is solved by a parallel solver. </div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><span class="HOEnZb"><font color="#888888"><br><br></font></span></div><span class="HOEnZb"><font color="#888888"><div>Fande,<br></div></font></span><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div></div><div class="m_2527651518883693442HOEnZb"><div class="m_2527651518883693442h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande <span dir="ltr"><<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <span dir="ltr"><<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>></span> wrote:<br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">You seem to have two levels here and 3M eqs on the fine grid and 37 on<br>

the coarse grid.</blockquote><div><br></div></span><div>37 is on the sub domain.<br><br></div><div> rows=18145, cols=18145 on the entire coarse grid. <br><br></div><div><div class="m_2527651518883693442m_-6187618188187603354h5"><div> <br></div><div><br> </div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"> I don't understand that.<br>

<br>

You are also calling the AMG setup a lot, but not spending much time<br>

in it. Try running with -info and grep on "GAMG".<br>

<div class="m_2527651518883693442m_-6187618188187603354m_6078227717222367195gmail-HOEnZb"><div class="m_2527651518883693442m_-6187618188187603354m_6078227717222367195gmail-h5"><br>

<br>

On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>

> Thanks, Barry.<br>

><br>

> It works.<br>

><br>

> GAMG is three times better than ASM in terms of the number of linear<br>

> iterations, but it is five times slower than ASM. Any suggestions to improve<br>

> the performance of GAMG? Log files are attached.<br>

><br>

> Fande,<br>

><br>

> On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>

>><br>

>><br>

>> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>

>> ><br>

>> > Thanks, Mark and Barry,<br>

>> ><br>

>> > It works pretty wells in terms of the number of linear iterations (using<br>

>> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am<br>

>> > using the two-level method via "-pc_mg_levels 2". The reason why the compute<br>

>> > time is larger than other preconditioning options is that a matrix free<br>

>> > method is used in the fine level and in my particular problem the function<br>

>> > evaluation is expensive.<br>

>> ><br>

>> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,<br>

>> > but I do not think I want to make the preconditioning part matrix-free.  Do<br>

>> > you guys know how to turn off the matrix-free method for GAMG?<br>

>><br>

>>    -pc_use_amat false<br>

>><br>

>> ><br>

>> > Here is the detailed solver:<br>

>> ><br>

>> > SNES Object: 384 MPI processes<br>

>> >   type: newtonls<br>

>> >   maximum iterations=200, maximum function evaluations=10000<br>

>> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50<br>

>> >   total number of linear solver iterations=20<br>

>> >   total number of function evaluations=166<br>

>> >   norm schedule ALWAYS<br>

>> >   SNESLineSearch Object:   384 MPI processes<br>

>> >     type: bt<br>

>> >       interpolation: cubic<br>

>> >       alpha=1.000000e-04<br>

>> >     maxstep=1.000000e+08, minlambda=1.000000e-12<br>

>> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,<br>

>> > lambda=1.000000e-08<br>

>> >     maximum iterations=40<br>

>> >   KSP Object:   384 MPI processes<br>

>> >     type: gmres<br>

>> >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt<br>

>> > Orthogonalization with no iterative refinement<br>

>> >       GMRES: happy breakdown tolerance 1e-30<br>

>> >     maximum iterations=100, initial guess is zero<br>

>> >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.<br>

>> >     right preconditioning<br>

>> >     using UNPRECONDITIONED norm type for convergence test<br>

>> >   PC Object:   384 MPI processes<br>

>> >     type: gamg<br>

>> >       MG: type is MULTIPLICATIVE, levels=2 cycles=v<br>

>> >         Cycles per PCApply=1<br>

>> >         Using Galerkin computed coarse grid matrices<br>

>> >         GAMG specific options<br>

>> >           Threshold for dropping small values from graph 0.<br>

>> >           AGG specific options<br>

>> >             Symmetric graph true<br>

>> >     Coarse grid solver -- level ------------------------------<wbr>-<br>

>> >       KSP Object:      (mg_coarse_)       384 MPI processes<br>

>> >         type: preonly<br>

>> >         maximum iterations=10000, initial guess is zero<br>

>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>> >         left preconditioning<br>

>> >         using NONE norm type for convergence test<br>

>> >       PC Object:      (mg_coarse_)       384 MPI processes<br>

>> >         type: bjacobi<br>

>> >           block Jacobi: number of blocks = 384<br>

>> >           Local solve is same for all blocks, in the following KSP and<br>

>> > PC objects:<br>

>> >         KSP Object:        (mg_coarse_sub_)         1 MPI processes<br>

>> >           type: preonly<br>

>> >           maximum iterations=1, initial guess is zero<br>

>> >           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>> >           left preconditioning<br>

>> >           using NONE norm type for convergence test<br>

>> >         PC Object:        (mg_coarse_sub_)         1 MPI processes<br>

>> >           type: lu<br>

>> >             LU: out-of-place factorization<br>

>> >             tolerance for zero pivot 2.22045e-14<br>

>> >             using diagonal shift on blocks to prevent zero pivot<br>

>> > [INBLOCKS]<br>

>> >             matrix ordering: nd<br>

>> >             factor fill ratio given 5., needed 1.31367<br>

>> >               Factored matrix follows:<br>

>> >                 Mat Object:                 1 MPI processes<br>

>> >                   type: seqaij<br>

>> >                   rows=37, cols=37<br>

>> >                   package used to perform factorization: petsc<br>

>> >                   total: nonzeros=913, allocated nonzeros=913<br>

>> >                   total number of mallocs used during MatSetValues calls<br>

>> > =0<br>

>> >                     not using I-node routines<br>

>> >           linear system matrix = precond matrix:<br>

>> >           Mat Object:           1 MPI processes<br>

>> >             type: seqaij<br>

>> >             rows=37, cols=37<br>

>> >             total: nonzeros=695, allocated nonzeros=695<br>

>> >             total number of mallocs used during MatSetValues calls =0<br>

>> >               not using I-node routines<br>

>> >         linear system matrix = precond matrix:<br>

>> >         Mat Object:         384 MPI processes<br>

>> >           type: mpiaij<br>

>> >           rows=18145, cols=18145<br>

>> >           total: nonzeros=1709115, allocated nonzeros=1709115<br>

>> >           total number of mallocs used during MatSetValues calls =0<br>

>> >             not using I-node (on process 0) routines<br>

>> >     Down solver (pre-smoother) on level 1<br>

>> > ------------------------------<wbr>-<br>

>> >       KSP Object:      (mg_levels_1_)       384 MPI processes<br>

>> >         type: chebyshev<br>

>> >           Chebyshev: eigenvalue estimates:  min = 0.133339, max =<br>

>> > 1.46673<br>

>> >           Chebyshev: eigenvalues estimated using gmres with translations<br>

>> > [0. 0.1; 0. 1.1]<br>

>> >           KSP Object:          (mg_levels_1_esteig_)           384 MPI<br>

>> > processes<br>

>> >             type: gmres<br>

>> >               GMRES: restart=30, using Classical (unmodified)<br>

>> > Gram-Schmidt Orthogonalization with no iterative refinement<br>

>> >               GMRES: happy breakdown tolerance 1e-30<br>

>> >             maximum iterations=10, initial guess is zero<br>

>> >             tolerances:  relative=1e-12, absolute=1e-50,<br>

>> > divergence=10000.<br>

>> >             left preconditioning<br>

>> >             using PRECONDITIONED norm type for convergence test<br>

>> >         maximum iterations=2<br>

>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>> >         left preconditioning<br>

>> >         using nonzero initial guess<br>

>> >         using NONE norm type for convergence test<br>

>> >       PC Object:      (mg_levels_1_)       384 MPI processes<br>

>> >         type: sor<br>

>> >           SOR: type = local_symmetric, iterations = 1, local iterations<br>

>> > = 1, omega = 1.<br>

>> >         linear system matrix followed by preconditioner matrix:<br>

>> >         Mat Object:         384 MPI processes<br>

>> >           type: mffd<br>

>> >           rows=3020875, cols=3020875<br>

>> >             Matrix-free approximation:<br>

>> >               err=1.49012e-08 (relative error in function evaluation)<br>

>> >               Using wp compute h routine<br>

>> >                   Does not compute normU<br>

>> >         Mat Object:        ()         384 MPI processes<br>

>> >           type: mpiaij<br>

>> >           rows=3020875, cols=3020875<br>

>> >           total: nonzeros=215671710, allocated nonzeros=241731750<br>

>> >           total number of mallocs used during MatSetValues calls =0<br>

>> >             not using I-node (on process 0) routines<br>

>> >     Up solver (post-smoother) same as down solver (pre-smoother)<br>

>> >     linear system matrix followed by preconditioner matrix:<br>

>> >     Mat Object:     384 MPI processes<br>

>> >       type: mffd<br>

>> >       rows=3020875, cols=3020875<br>

>> >         Matrix-free approximation:<br>

>> >           err=1.49012e-08 (relative error in function evaluation)<br>

>> >           Using wp compute h routine<br>

>> >               Does not compute normU<br>

>> >     Mat Object:    ()     384 MPI processes<br>

>> >       type: mpiaij<br>

>> >       rows=3020875, cols=3020875<br>

>> >       total: nonzeros=215671710, allocated nonzeros=241731750<br>

>> >       total number of mallocs used during MatSetValues calls =0<br>

>> >         not using I-node (on process 0) routines<br>

>> ><br>

>> ><br>

>> > Fande,<br>

>> ><br>

>> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

>> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>

>> > ><br>

>> > >> Does this mean that GAMG works for the symmetrical matrix only?<br>

>> > ><br>

>> > >   No, it means that for non symmetric nonzero structure you need the<br>

>> > > extra flag. So use the extra flag. The reason we don't always use the flag<br>

>> > > is because it adds extra cost and isn't needed if the matrix already has a<br>

>> > > symmetric nonzero structure.<br>

>> ><br>

>> > BTW, if you have symmetric non-zero structure you can just set<br>

>> > -pc_gamg_threshold -1.0', note the "or" in the message.<br>

>> ><br>

>> > If you want to mess with the threshold then you need to use the<br>

>> > symmetrized flag.<br>

>> ><br>

>><br>

><br>

</div></div></blockquote></div></div></div><br></div></div>

</blockquote></div><br></div>

</div></div></blockquote></div></div></div><br></div></div>

</blockquote></div><br></div></div>