<div dir="ltr"><div><div><div><div>To speed up the smoother you could:<br></div><div>1) do less smoothing, but this might incur needing to do more KSP iterations<br></div><div>2) run on more cores<br></div><div>3) if you are happy with cheby/jacobi, you could write a matrix free implementation for your discretisation which gets used on the fine level. Depending on your discretisation, this might improve the speed and scalability.<br>
</div><br><div>As I said, the prefix for the solver is mentioned in the result of -ksp_view<br>e.g.<br>PC Object: (<b>mg_coarse_</b>) 32 MPI processes<br>
type: redundant<br><br></div>The string in brackets (bolded) is the solver prefix.<br></div>So if you want to change the coarse grid solver, do this<br><br></div>-mg_coarse_ksp_type XXXX<br></div>-mg_coarse_pc_type YYYY<br>
<br><br><div><div>Cheers,<br></div><div> Dave<br></div><div><div><div><br></div></div></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 6 November 2013 19:35, Alan Z. Wei <span dir="ltr"><<a href="mailto:zhenglun.wei@gmail.com" target="_blank">zhenglun.wei@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>Thanks Dave,<br>
I further simulated the problem with -pc_mg_log and output
these files in the attachments. <br>
I found that the smoothing process of the last level always
consumes the most time, i.e. 'MGSmooth Level 5' in out-level5 and
"MGSmooth Level 2' in out-level2. However, as I tested several
other -mg_levels_pc_type such as 'bjacobi', 'asm' etc. The default
one, which is 'jacobi', actually works the best. Therefore, I
decide to keep using it. However, do you have any suggestions to
speed up this smoothing process other than changing
-mg_levels_pc_type?<br>
Also, as you suggested to change -mg_levels_ksp_type, it does
not influence much if replacing 'chebyshev' by 'cg'. However, this
part never change while I modify '-mg_levels_ksp_type':<br>
PC Object: (mg_coarse_) 32 MPI processes<br>
type: redundant<br>
Redundant preconditioner: First (color=0) of 32 PCs
follows<br>
KSP Object: (mg_coarse_redundant_) 1 MPI
processes<br>
type: preonly<br>
maximum iterations=10000, initial guess is zero<br>
tolerances: relative=1e-05, absolute=1e-50,
divergence=10000<br>
left preconditioning<br>
using NONE norm type for convergence test<br>
PC Object: (mg_coarse_redundant_) 1 MPI processes<br>
type: lu<br>
LU: out-of-place factorization<br>
As you mentioned that the redundant LU for the coarse grid
solver primarily cause the large memory request for the 2-level
case. How could I change the coarse grid solver to reduce the
memory requirement or speed up the solver. <br>
<br>
thanks again,<br>
Alan<br>
<br>
</div><div><div class="h5">
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>Hey Alan,<br>
<br>
</div>
1/ One difference in the memory footprint is
likely coming from your coarse grid solver which
is redundant LU.<br>
</div>
The 2 level case has a coarse grid problem with
70785 unknowns whilst the 5 level case has a coarse
grid problem with 225 unknowns.<br>
</div>
<br>
2/ The solve time difference will be affected by your
coarse grid size. Add the command line argument<br>
-pc_mg_log <br>
to profile the setup time spent on the coarse grid and
all other levels.<br>
</div>
See<br>
<a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCMG.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCMG.html</a><br>
<br>
3/ You can change the smoother on all levels by using
the command line argument with the appropriate prefix,
eg<br>
</div>
-mg_levels_ksp_type cg<br>
</div>
Note the prefix is displayed in the result of -ksp_view<br>
<br>
</div>
<div>Also, your mesh size can be altered at run time using
arguments like<br>
</div>
<div>-da_grid_x 5<br>
</div>
<div>
You shouldn't have to modify the source code each time<br>
See <br>
<a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMDACreate3d.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMDACreate3d.html</a><br>
</div>
<div><br>
<br>
</div>
Cheers,<br>
</div>
Dave</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On 6 November 2013 04:21, Alan <span dir="ltr"><<a href="mailto:zhenglun.wei@gmail.com" target="_blank">zhenglun.wei@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear all,<br>
I hope you're having a nice day.<br>
Recently, I came across a problem on using MG as
preconditioner.<br>
Basically, to achieve the same finest mesh with pc_type =
mg, the memory<br>
usage for -da_refine 2 is much more than that for -da_refine
5. To my<br>
limited knowledge, more refinement should consume more
memory, which is<br>
contradict to the behavior of pc_type = mg in PETsc.<br>
Here, I provide two output files. They are all from<br>
/src/ksp/ksp/example/tutorial/ex45.c with 32 processes.<br>
The execute file for out-level2 is<br>
mpiexec -np 32 ./ex45 -pc_type mg -ksp_type cg -da_refine 2<br>
-pc_mg_galerkin -ksp_rtol 1.0e-7 -mg_levels_pc_type jacobi<br>
-mg_levels_ksp_type chebyshev -dm_view -log_summary
-pc_mg_log<br>
-pc_mg_monitor -ksp_view -ksp_monitor > out &<br>
and in ex45.c, KSPCreate is changed as:<br>
ierr =<br>
DMDACreate3d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,-65,-33,-33,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr);<br>
On the other hand, the execute file for out-level5 is<br>
mpiexec -np 32 ./ex45 -pc_type mg -ksp_type cg -da_refine 5<br>
-pc_mg_galerkin -ksp_rtol 1.0e-7 -mg_levels_pc_type jacobi<br>
-mg_levels_ksp_type chebyshev -dm_view -log_summary
-pc_mg_log<br>
-pc_mg_monitor -ksp_view -ksp_monitor > out &<br>
and in ex45.c, KSPCreate is changed as:<br>
ierr =<br>
DMDACreate3d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,-9,-5,-5,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr);<br>
In summary, the final finest meshes obtained for both cases
are<br>
257*129*129 as documented in both files. However, the
out-level2 shows<br>
that the Matrix requested 822871308 memory while out-level5
only need<br>
36052932.<br>
Furthermore, although the total iterations for KSP solver
are shown as 5<br>
times in both files. the wall time elapsed for out-level2 is
around<br>
150s, while out-level5 is about 4.7s.<br>
At last, there is a minor question. In both files, under
'Down solver<br>
(pre-smoother) on level 1' and 'Down solver (pre-smoother)
on level 2',<br>
the type of "KSP Object: (mg_levels_1_est_)" and "KSP
Object:<br>
(mg_levels_2_est_)" are all 'gmres'. Since I'm using
uniformly Cartesian<br>
mesh, would it be helpful to speed up the solver if the
'gmres' is<br>
replaced by 'cg' here? If so, which PETSc option can change
the type of<br>
KSP object.<br>
<br>
sincerely appreciate,<br>
Alan<br>
<br>
<br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br></div>