<div dir="ltr"><div><div><div><div>To speed up the smoother you could:<br></div><div>1) do less smoothing, but this might incur needing to do more KSP iterations<br></div><div>2) run on more cores<br></div><div>3) if you are happy with cheby/jacobi, you could write a matrix free implementation for your discretisation which gets used on the fine level. Depending on your discretisation, this might improve the speed and scalability.<br>

</div><br><div>As I said, the prefix for the solver is mentioned in the result of -ksp_view<br>e.g.<br>PC Object:    (<b>mg_coarse_</b>)     32 MPI processes<br>

            type: redundant<br><br></div>The string in brackets (bolded) is the solver prefix.<br></div>So if you want to change the coarse grid solver, do this<br><br></div>-mg_coarse_ksp_type XXXX<br></div>-mg_coarse_pc_type YYYY<br>

<br><br><div><div>Cheers,<br></div><div>  Dave<br></div><div><div><div><br></div></div></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 6 November 2013 19:35, Alan Z. Wei <span dir="ltr"><<a href="mailto:zhenglun.wei@gmail.com" target="_blank">zhenglun.wei@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div text="#000000" bgcolor="#FFFFFF">

    <div>Thanks Dave,<br>

          I further simulated the problem with -pc_mg_log and output

      these files in the attachments. <br>

          I found that the smoothing process of the last level always

      consumes the most time, i.e. 'MGSmooth Level 5' in out-level5 and

      "MGSmooth Level 2' in out-level2. However, as I tested several

      other -mg_levels_pc_type such as 'bjacobi', 'asm' etc. The default

      one, which is 'jacobi', actually works the best. Therefore, I

      decide to keep using it. However, do you have any suggestions to

      speed up this smoothing process other than changing

      -mg_levels_pc_type?<br>

           Also, as you suggested to change -mg_levels_ksp_type, it does

      not influence much if replacing 'chebyshev' by 'cg'. However, this

      part never change while I modify '-mg_levels_ksp_type':<br>

      PC Object:    (mg_coarse_)     32 MPI processes<br>

            type: redundant<br>

              Redundant preconditioner: First (color=0) of 32 PCs

      follows<br>

            KSP Object:      (mg_coarse_redundant_)       1 MPI

      processes<br>

              type: preonly<br>

              maximum iterations=10000, initial guess is zero<br>

              tolerances:  relative=1e-05, absolute=1e-50,

      divergence=10000<br>

              left preconditioning<br>

              using NONE norm type for convergence test<br>

            PC Object:      (mg_coarse_redundant_)       1 MPI processes<br>

              type: lu<br>

                LU: out-of-place factorization<br>

            As you mentioned that the redundant LU for the coarse grid

      solver primarily cause the large memory request for the 2-level

      case. How could I change the coarse grid solver to reduce the

      memory requirement or speed up the solver. <br>

      <br>

      thanks again,<br>

      Alan<br>

      <br>

    </div><div><div class="h5">

    <blockquote type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div>

                <div>

                  <div>

                    <div>

                      <div>Hey Alan,<br>

                        <br>

                      </div>

                      1/ One difference in the memory footprint is

                      likely coming from your coarse grid solver which

                      is redundant LU.<br>

                    </div>

                    The 2 level case has a coarse grid problem with

                    70785 unknowns whilst the 5 level case has a coarse

                    grid problem with 225 unknowns.<br>

                  </div>

                  <br>

                  2/ The solve time difference will be affected by your

                  coarse grid size. Add the command line argument<br>

                    -pc_mg_log <br>

                  to profile the setup time spent on the coarse grid and

                  all other levels.<br>

                </div>

                See<br>

                  <a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCMG.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCMG.html</a><br>

                <br>

                3/ You can change the smoother on all levels by using

                the command line argument with the appropriate prefix,

                eg<br>

              </div>

                -mg_levels_ksp_type cg<br>

            </div>

            Note the prefix is displayed in the result of -ksp_view<br>

            <br>

          </div>

          <div>Also, your mesh size can be altered at run time using

            arguments like<br>

          </div>

          <div>-da_grid_x 5<br>

          </div>

          <div>

            You shouldn't have to modify the source code each time<br>

            See <br>

              <a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMDACreate3d.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMDACreate3d.html</a><br>

          </div>

          <div><br>

            <br>

          </div>

          Cheers,<br>

        </div>

          Dave</div>

      <div class="gmail_extra"><br>

        <br>

        <div class="gmail_quote">On 6 November 2013 04:21, Alan <span dir="ltr"><<a href="mailto:zhenglun.wei@gmail.com" target="_blank">zhenglun.wei@gmail.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear all,<br>

            I hope you're having a nice day.<br>

            Recently, I came across a problem on using MG as

            preconditioner.<br>

            Basically, to achieve the same finest mesh with pc_type =

            mg, the memory<br>

            usage for -da_refine 2 is much more than that for -da_refine

            5. To my<br>

            limited knowledge, more refinement should consume more

            memory, which is<br>

            contradict to the behavior of pc_type = mg in PETsc.<br>

            Here, I provide two output files. They are all from<br>

            /src/ksp/ksp/example/tutorial/ex45.c with 32 processes.<br>

            The execute file for out-level2 is<br>

            mpiexec -np 32 ./ex45 -pc_type mg -ksp_type cg -da_refine 2<br>

            -pc_mg_galerkin -ksp_rtol 1.0e-7 -mg_levels_pc_type jacobi<br>

            -mg_levels_ksp_type chebyshev -dm_view -log_summary

            -pc_mg_log<br>

            -pc_mg_monitor -ksp_view -ksp_monitor > out &<br>

            and in ex45.c, KSPCreate is changed as:<br>

            ierr =<br>

DMDACreate3d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,-65,-33,-33,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr);<br>

            On the other hand, the execute file for out-level5 is<br>

            mpiexec -np 32 ./ex45 -pc_type mg -ksp_type cg -da_refine 5<br>

            -pc_mg_galerkin -ksp_rtol 1.0e-7 -mg_levels_pc_type jacobi<br>

            -mg_levels_ksp_type chebyshev -dm_view -log_summary

            -pc_mg_log<br>

            -pc_mg_monitor -ksp_view -ksp_monitor > out &<br>

            and in ex45.c, KSPCreate is changed as:<br>

            ierr =<br>

DMDACreate3d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,-9,-5,-5,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr);<br>

            In summary, the final finest meshes obtained for both cases

            are<br>

            257*129*129 as documented in both files. However, the

            out-level2 shows<br>

            that the Matrix requested 822871308 memory while out-level5

            only need<br>

            36052932.<br>

            Furthermore, although the total iterations for KSP solver

            are shown as 5<br>

            times in both files. the wall time elapsed for out-level2 is

            around<br>

            150s, while out-level5 is about 4.7s.<br>

            At last, there is a minor question. In both files, under

            'Down solver<br>

            (pre-smoother) on level 1' and 'Down solver (pre-smoother)

            on level 2',<br>

            the type of "KSP Object: (mg_levels_1_est_)" and "KSP

            Object:<br>

            (mg_levels_2_est_)" are all 'gmres'. Since I'm using

            uniformly Cartesian<br>

            mesh, would it be helpful to speed up the solver if the

            'gmres' is<br>

            replaced by 'cg' here? If so, which PETSc option can change

            the type of<br>

            KSP object.<br>

            <br>

            sincerely appreciate,<br>

            Alan<br>

            <br>

            <br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </div></div></div>


</blockquote></div><br></div>