<div dir="ltr"><div>Hi Mark,<br><br></div>Thanks for your reply.<br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 12, 2017 at 9:16 AM, Mark Adams <span dir="ltr"><<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The problem comes from setting the number of MG levels (<span style="color:rgb(0,0,0);font-family:"courier new",courier,monospace,arial,sans-serif;font-size:14px;white-space:pre-wrap">-pc_mg_levels 2</span>). Not your fault, it looks like the GAMG logic is faulty, in your version at least.</div></blockquote><div><br></div><div>What I want is that GAMG coarsens the fine matrix once and then stops doing anything. I did not see any benefits to have more levels if the number of processors is small. <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div> <br><div>GAMG will force the coarsest grid to one processor by default, in newer versions. You can override the default with:<div><br></div><div>-pc_gamg_use_parallel_coarse_<wbr>grid_solver</div><div><br></div><div>Your coarse grid solver is ASM with these 37 equation per process and 512 processes. That is bad. </div></div></div></div></blockquote><div><br></div><div>Why this is bad? The subdomain problem is too small? <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Note, you could run this on one process to see the proper convergence rate. </div></div></div></div></blockquote><div><br></div><div>Convergence rate for which part? coarse solver, subdomain solver?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>You can fix this with parameters:<br><div><br></div><div><div>> -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations per process on coarse grids (PCGAMGSetProcEqLim)</div><div>> -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the coarse grid (PCGAMGSetCoarseEqLim)</div></div><div><br></div></div></div></div><div>If you really want two levels then set something like -pc_gamg_coarse_eq_limit <span style="font-size:12.8px">18145 (or higher) </span>-pc_gamg_coarse_eq_<wbr>limit <span style="font-size:12.8px">18145 (or higher). </span></div></div></blockquote><div><br><br></div><div>May have something like: make the coarse problem 1/8 large as the original problem? Otherwise, this number is just problem dependent. <br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><span style="font-size:12.8px">You can run with -info and grep on GAMG and you will meta-data for each level. you should see "npe=1" for the coarsest, last, grid. Or use a parallel direct solver.</span></div></div></blockquote><div><br></div><div>I will try. <br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Note, you should not see much degradation as you increase the number of levels. 18145 eqs on a 3D problem will probably be noticeable. I generally aim for about 3000.</span></div></div></blockquote><div><br></div><div>It should be fine as long as the coarse problem is solved by a parallel solver. <br><br><br></div><div>Fande,<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande <span dir="ltr"><<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <span dir="ltr"><<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>></span> wrote:<br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">You seem to have two levels here and 3M eqs on the fine grid and 37 on<br>
the coarse grid.</blockquote><div><br></div></span><div>37 is on the sub domain.<br><br></div><div> rows=18145, cols=18145 on the entire coarse grid. <br><br></div><div><div class="m_-6187618188187603354h5"><div> <br></div><div><br> </div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"> I don't understand that.<br>
<br>
You are also calling the AMG setup a lot, but not spending much time<br>
in it. Try running with -info and grep on "GAMG".<br>
<div class="m_-6187618188187603354m_6078227717222367195gmail-HOEnZb"><div class="m_-6187618188187603354m_6078227717222367195gmail-h5"><br>
<br>
On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>
> Thanks, Barry.<br>
><br>
> It works.<br>
><br>
> GAMG is three times better than ASM in terms of the number of linear<br>
> iterations, but it is five times slower than ASM. Any suggestions to improve<br>
> the performance of GAMG? Log files are attached.<br>
><br>
> Fande,<br>
><br>
> On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
>><br>
>><br>
>> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>
>> ><br>
>> > Thanks, Mark and Barry,<br>
>> ><br>
>> > It works pretty wells in terms of the number of linear iterations (using<br>
>> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am<br>
>> > using the two-level method via "-pc_mg_levels 2". The reason why the compute<br>
>> > time is larger than other preconditioning options is that a matrix free<br>
>> > method is used in the fine level and in my particular problem the function<br>
>> > evaluation is expensive.<br>
>> ><br>
>> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,<br>
>> > but I do not think I want to make the preconditioning part matrix-free. Do<br>
>> > you guys know how to turn off the matrix-free method for GAMG?<br>
>><br>
>> -pc_use_amat false<br>
>><br>
>> ><br>
>> > Here is the detailed solver:<br>
>> ><br>
>> > SNES Object: 384 MPI processes<br>
>> > type: newtonls<br>
>> > maximum iterations=200, maximum function evaluations=10000<br>
>> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50<br>
>> > total number of linear solver iterations=20<br>
>> > total number of function evaluations=166<br>
>> > norm schedule ALWAYS<br>
>> > SNESLineSearch Object: 384 MPI processes<br>
>> > type: bt<br>
>> > interpolation: cubic<br>
>> > alpha=1.000000e-04<br>
>> > maxstep=1.000000e+08, minlambda=1.000000e-12<br>
>> > tolerances: relative=1.000000e-08, absolute=1.000000e-15,<br>
>> > lambda=1.000000e-08<br>
>> > maximum iterations=40<br>
>> > KSP Object: 384 MPI processes<br>
>> > type: gmres<br>
>> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt<br>
>> > Orthogonalization with no iterative refinement<br>
>> > GMRES: happy breakdown tolerance 1e-30<br>
>> > maximum iterations=100, initial guess is zero<br>
>> > tolerances: relative=0.001, absolute=1e-50, divergence=10000.<br>
>> > right preconditioning<br>
>> > using UNPRECONDITIONED norm type for convergence test<br>
>> > PC Object: 384 MPI processes<br>
>> > type: gamg<br>
>> > MG: type is MULTIPLICATIVE, levels=2 cycles=v<br>
>> > Cycles per PCApply=1<br>
>> > Using Galerkin computed coarse grid matrices<br>
>> > GAMG specific options<br>
>> > Threshold for dropping small values from graph 0.<br>
>> > AGG specific options<br>
>> > Symmetric graph true<br>
>> > Coarse grid solver -- level ------------------------------<wbr>-<br>
>> > KSP Object: (mg_coarse_) 384 MPI processes<br>
>> > type: preonly<br>
>> > maximum iterations=10000, initial guess is zero<br>
>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br>
>> > left preconditioning<br>
>> > using NONE norm type for convergence test<br>
>> > PC Object: (mg_coarse_) 384 MPI processes<br>
>> > type: bjacobi<br>
>> > block Jacobi: number of blocks = 384<br>
>> > Local solve is same for all blocks, in the following KSP and<br>
>> > PC objects:<br>
>> > KSP Object: (mg_coarse_sub_) 1 MPI processes<br>
>> > type: preonly<br>
>> > maximum iterations=1, initial guess is zero<br>
>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br>
>> > left preconditioning<br>
>> > using NONE norm type for convergence test<br>
>> > PC Object: (mg_coarse_sub_) 1 MPI processes<br>
>> > type: lu<br>
>> > LU: out-of-place factorization<br>
>> > tolerance for zero pivot 2.22045e-14<br>
>> > using diagonal shift on blocks to prevent zero pivot<br>
>> > [INBLOCKS]<br>
>> > matrix ordering: nd<br>
>> > factor fill ratio given 5., needed 1.31367<br>
>> > Factored matrix follows:<br>
>> > Mat Object: 1 MPI processes<br>
>> > type: seqaij<br>
>> > rows=37, cols=37<br>
>> > package used to perform factorization: petsc<br>
>> > total: nonzeros=913, allocated nonzeros=913<br>
>> > total number of mallocs used during MatSetValues calls<br>
>> > =0<br>
>> > not using I-node routines<br>
>> > linear system matrix = precond matrix:<br>
>> > Mat Object: 1 MPI processes<br>
>> > type: seqaij<br>
>> > rows=37, cols=37<br>
>> > total: nonzeros=695, allocated nonzeros=695<br>
>> > total number of mallocs used during MatSetValues calls =0<br>
>> > not using I-node routines<br>
>> > linear system matrix = precond matrix:<br>
>> > Mat Object: 384 MPI processes<br>
>> > type: mpiaij<br>
>> > rows=18145, cols=18145<br>
>> > total: nonzeros=1709115, allocated nonzeros=1709115<br>
>> > total number of mallocs used during MatSetValues calls =0<br>
>> > not using I-node (on process 0) routines<br>
>> > Down solver (pre-smoother) on level 1<br>
>> > ------------------------------<wbr>-<br>
>> > KSP Object: (mg_levels_1_) 384 MPI processes<br>
>> > type: chebyshev<br>
>> > Chebyshev: eigenvalue estimates: min = 0.133339, max =<br>
>> > 1.46673<br>
>> > Chebyshev: eigenvalues estimated using gmres with translations<br>
>> > [0. 0.1; 0. 1.1]<br>
>> > KSP Object: (mg_levels_1_esteig_) 384 MPI<br>
>> > processes<br>
>> > type: gmres<br>
>> > GMRES: restart=30, using Classical (unmodified)<br>
>> > Gram-Schmidt Orthogonalization with no iterative refinement<br>
>> > GMRES: happy breakdown tolerance 1e-30<br>
>> > maximum iterations=10, initial guess is zero<br>
>> > tolerances: relative=1e-12, absolute=1e-50,<br>
>> > divergence=10000.<br>
>> > left preconditioning<br>
>> > using PRECONDITIONED norm type for convergence test<br>
>> > maximum iterations=2<br>
>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br>
>> > left preconditioning<br>
>> > using nonzero initial guess<br>
>> > using NONE norm type for convergence test<br>
>> > PC Object: (mg_levels_1_) 384 MPI processes<br>
>> > type: sor<br>
>> > SOR: type = local_symmetric, iterations = 1, local iterations<br>
>> > = 1, omega = 1.<br>
>> > linear system matrix followed by preconditioner matrix:<br>
>> > Mat Object: 384 MPI processes<br>
>> > type: mffd<br>
>> > rows=3020875, cols=3020875<br>
>> > Matrix-free approximation:<br>
>> > err=1.49012e-08 (relative error in function evaluation)<br>
>> > Using wp compute h routine<br>
>> > Does not compute normU<br>
>> > Mat Object: () 384 MPI processes<br>
>> > type: mpiaij<br>
>> > rows=3020875, cols=3020875<br>
>> > total: nonzeros=215671710, allocated nonzeros=241731750<br>
>> > total number of mallocs used during MatSetValues calls =0<br>
>> > not using I-node (on process 0) routines<br>
>> > Up solver (post-smoother) same as down solver (pre-smoother)<br>
>> > linear system matrix followed by preconditioner matrix:<br>
>> > Mat Object: 384 MPI processes<br>
>> > type: mffd<br>
>> > rows=3020875, cols=3020875<br>
>> > Matrix-free approximation:<br>
>> > err=1.49012e-08 (relative error in function evaluation)<br>
>> > Using wp compute h routine<br>
>> > Does not compute normU<br>
>> > Mat Object: () 384 MPI processes<br>
>> > type: mpiaij<br>
>> > rows=3020875, cols=3020875<br>
>> > total: nonzeros=215671710, allocated nonzeros=241731750<br>
>> > total number of mallocs used during MatSetValues calls =0<br>
>> > not using I-node (on process 0) routines<br>
>> ><br>
>> ><br>
>> > Fande,<br>
>> ><br>
>> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>
>> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
>> > ><br>
>> > >> Does this mean that GAMG works for the symmetrical matrix only?<br>
>> > ><br>
>> > > No, it means that for non symmetric nonzero structure you need the<br>
>> > > extra flag. So use the extra flag. The reason we don't always use the flag<br>
>> > > is because it adds extra cost and isn't needed if the matrix already has a<br>
>> > > symmetric nonzero structure.<br>
>> ><br>
>> > BTW, if you have symmetric non-zero structure you can just set<br>
>> > -pc_gamg_threshold -1.0', note the "or" in the message.<br>
>> ><br>
>> > If you want to mess with the threshold then you need to use the<br>
>> > symmetrized flag.<br>
>> ><br>
>><br>
><br>
</div></div></blockquote></div></div></div><br></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div></div>