On Wed, Jul 29, 2009 at 4:17 PM, BAYRAKTAR Harun <span dir="ltr">&lt;<a href="mailto:Harun.BAYRAKTAR@3ds.com">Harun.BAYRAKTAR@3ds.com</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<div link="blue" vlink="purple" lang="EN-US">


<div>


<p><span style="font-size: 11pt; color: rgb(31, 73, 125);">Matt,</span></p>


<p><span style="font-size: 11pt; color: rgb(31, 73, 125);">It is from the pressure poisson equation for incompressible

navier-stokes so it is elliptic. Also on 1 cpu, I am able to solve it with

reason able iteration count (i.e., 43 to 1.e-5 true res norm rel tolerance). It

is the parallel runs that really concern me.</span></p></div></div></blockquote><div>Actually, it was the 43 that really concerned me. In my experience, an MG that is doing what it is supposed<br>to on Poisson takes &lt; 10 iterations. However, if your grid is pretty distorted, maybe it can get this bad.<br>

<br>  Matt<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div link="blue" vlink="purple" lang="EN-US"><div><p><span style="font-size: 11pt; color: rgb(31, 73, 125);"> <br>

</span></p>


<p><span style="font-size: 11pt; color: rgb(31, 73, 125);">Thanks,</span></p>


<p><span style="font-size: 11pt; color: rgb(31, 73, 125);">Harun</span></p>


<p><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>


<p><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>


<div style="border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0in 0in;">


<p><b><span style="font-size: 10pt;">From:</span></b><span style="font-size: 10pt;">

<a href="mailto:petsc-users-bounces@mcs.anl.gov" target="_blank">petsc-users-bounces@mcs.anl.gov</a> [mailto:<a href="mailto:petsc-users-bounces@mcs.anl.gov" target="_blank">petsc-users-bounces@mcs.anl.gov</a>] <b>On

Behalf Of </b>Matthew Knepley<br>

<b>Sent:</b> Wednesday, July 29, 2009 5:00 PM<br>

<b>To:</b> PETSc users list<br>

<b>Subject:</b> Re: Smoother settings for AMG</span></p>


</div>


<p> </p>


<p>On Wed, Jul 29, 2009 at 3:54 PM, BAYRAKTAR Harun &lt;<a href="mailto:Harun.BAYRAKTAR@3ds.com" target="_blank">Harun.BAYRAKTAR@3ds.com</a>&gt; wrote:</p>


<div>


<blockquote style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color rgb(204, 204, 204); border-width: medium medium medium 1pt; padding: 0in 0in 0in 6pt; margin-left: 4.8pt; margin-right: 0in;">


<p>Hi,<br>

<br>

I am trying to solve a system of equations and I am having difficulty<br>

picking the right smoothers for AMG (using ML as pc_type) in PETSc for<br>

parallel execution. First here is what happens in terms of CG (ksp_type)<br>

iteration counts (both columns use block jacobi):</p>


</blockquote>


<div>


<p><br>

Are you sure you have an elliptic system? These iteration counts are extremely<br>

high.<br>

<br>

  Matt<br>

 </p>


</div>


<blockquote style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color rgb(204, 204, 204); border-width: medium medium medium 1pt; padding: 0in 0in 0in 6pt; margin-left: 4.8pt; margin-right: 0in;">


<p style="margin-bottom: 12pt;"><br>

cpus    |       AMG w/ ICC(0) x1      

 |       AMG w/ SOR x4<br>

------------------------------------------------------<br>

1       |               43

             |        

      243<br>

4       |               699

            |          

    379<br>

<br>

x1 or x4 means 1 or 4 iterations of smoother application at each AMG<br>

level (all details from ksp view for the 4 cpu run are below). The main<br>

observation is that on 1 cpu, AMG w/ ICC(0) is a clear winner but falls<br>

apart in parallel. SOR on the other hand experiences a 1.5X increase in<br>

iteration count which is totally expected from the quality of coarsening<br>

ML delivers in parallel.<br>

<br>

I basically would like to find a way (if possible) to have the number of<br>

iterations in parallel stay with 1-2X of 1 cpu iteration count for the<br>

AMG w/ ICC case. Is there a way to achieve this?<br>

<br>

Thanks,<br>

Harun<br>

<br>

%%%%%%%%%%%%%%%%%%%%%%%%%<br>

AMG w/ ICC(0) x1 ksp_view<br>

%%%%%%%%%%%%%%%%%%%%%%%%%<br>

KSP Object:<br>

 type: cg<br>

 maximum iterations=10000<br>

 tolerances:  relative=1e-05, absolute=1e-50, divergence=10000<br>

 left preconditioning<br>

PC Object:<br>

 type: ml<br>

   MG: type is MULTIPLICATIVE, levels=3 cycles=v, pre-smooths=1,<br>

post-smooths=1<br>

 Coarse gride solver -- level 0 -------------------------------<br>

   KSP Object:(mg_coarse_)<br>

     type: preonly<br>

     maximum iterations=1, initial guess is zero<br>

     tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

     left preconditioning<br>

   PC Object:(mg_coarse_)<br>

     type: redundant<br>

       Redundant preconditioner: First (color=0) of 4 PCs

follows<br>

     KSP Object:(mg_coarse_redundant_)<br>

       type: preonly<br>

       maximum iterations=10000, initial guess is zero<br>

       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000<br>

       left preconditioning<br>

     PC Object:(mg_coarse_redundant_)<br>

       type: lu<br>

         LU: out-of-place factorization<br>

           matrix ordering: nd<br>

         LU: tolerance for zero pivot 1e-12<br>

         LU: factor fill ratio needed 2.17227<br>

              Factored matrix follows<br>

             Matrix Object:<br>

               type=seqaij, rows=283,

cols=283<br>

               total: nonzeros=21651,

allocated nonzeros=21651<br>

                 using I-node

routines: found 186 nodes, limit used is<br>

5<br>

       linear system matrix = precond matrix:<br>

       Matrix Object:<br>

         type=seqaij, rows=283, cols=283<br>

         total: nonzeros=9967, allocated

nonzeros=14150<br>

           not using I-node routines<br>

     linear system matrix = precond matrix:<br>

     Matrix Object:<br>

       type=mpiaij, rows=283, cols=283<br>

       total: nonzeros=9967, allocated nonzeros=9967<br>

         not using I-node (on process 0) routines<br>

 Down solver (pre-smoother) on level 1 -------------------------------<br>

   KSP Object:(mg_levels_1_)<br>

     type: richardson<br>

       Richardson: damping factor=0.9<br>

     maximum iterations=1, initial guess is zero<br>

     tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

     left preconditioning<br>

   PC Object:(mg_levels_1_)<br>

     type: bjacobi<br>

       block Jacobi: number of blocks = 4<br>

       Local solve is same for all blocks, in the following

KSP and PC<br>

objects:<br>

     KSP Object:(mg_levels_1_sub_)<br>

       type: preonly<br>

       maximum iterations=10000, initial guess is zero<br>

       tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

       left preconditioning<br>

     PC Object:(mg_levels_1_sub_)<br>

       type: icc<br>

         ICC: 0 levels of fill<br>

         ICC: factor fill ratio allocated 1<br>

         ICC: using Manteuffel shift<br>

         ICC: factor fill ratio needed 0.514899<br>

              Factored matrix follows<br>

             Matrix Object:<br>

               type=seqsbaij,

rows=2813, cols=2813<br>

               total: nonzeros=48609,

allocated nonzeros=48609<br>

                   block size

is 1<br>

       linear system matrix = precond matrix:<br>

       Matrix Object:<br>

         type=seqaij, rows=2813, cols=2813<br>

         total: nonzeros=94405, allocated

nonzeros=94405<br>

           not using I-node routines<br>

     linear system matrix = precond matrix:<br>

     Matrix Object:<br>

       type=mpiaij, rows=10654, cols=10654<br>

       total: nonzeros=376634, allocated nonzeros=376634<br>

         not using I-node (on process 0) routines<br>

 Up solver (post-smoother) on level 1 -------------------------------<br>

   KSP Object:(mg_levels_1_)<br>

     type: richardson<br>

       Richardson: damping factor=0.9<br>

     maximum iterations=1<br>

     tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

     left preconditioning<br>

   PC Object:(mg_levels_1_)<br>

     type: bjacobi<br>

       block Jacobi: number of blocks = 4<br>

       Local solve is same for all blocks, in the following

KSP and PC<br>

objects:<br>

     KSP Object:(mg_levels_1_sub_)<br>

       type: preonly<br>

       maximum iterations=10000, initial guess is zero<br>

       tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

       left preconditioning<br>

     PC Object:(mg_levels_1_sub_)<br>

       type: icc<br>

         ICC: 0 levels of fill<br>

         ICC: factor fill ratio allocated 1<br>

         ICC: using Manteuffel shift<br>

         ICC: factor fill ratio needed 0.514899<br>

              Factored matrix follows<br>

             Matrix Object:<br>

               type=seqsbaij,

rows=2813, cols=2813<br>

               total: nonzeros=48609,

allocated nonzeros=48609<br>

                   block size

is 1<br>

       linear system matrix = precond matrix:<br>

       Matrix Object:<br>

         type=seqaij, rows=2813, cols=2813<br>

         total: nonzeros=94405, allocated

nonzeros=94405<br>

           not using I-node routines<br>

     linear system matrix = precond matrix:<br>

     Matrix Object:<br>

       type=mpiaij, rows=10654, cols=10654<br>

       total: nonzeros=376634, allocated nonzeros=376634<br>

         not using I-node (on process 0) routines<br>

 Down solver (pre-smoother) on level 2 -------------------------------<br>

   KSP Object:(mg_levels_2_)<br>

     type: richardson<br>

       Richardson: damping factor=0.9<br>

     maximum iterations=1, initial guess is zero<br>

     tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

     left preconditioning<br>

   PC Object:(mg_levels_2_)<br>

     type: bjacobi<br>

       block Jacobi: number of blocks = 4<br>

       Local solve is same for all blocks, in the following

KSP and PC<br>

objects:<br>

     KSP Object:(mg_levels_2_sub_)<br>

       type: preonly<br>

       maximum iterations=10000, initial guess is zero<br>

       tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

       left preconditioning<br>

     PC Object:(mg_levels_2_sub_)<br>

       type: icc<br>

         ICC: 0 levels of fill<br>

         ICC: factor fill ratio allocated 1<br>

         ICC: using Manteuffel shift<br>

         ICC: factor fill ratio needed 0.519045<br>

              Factored matrix follows<br>

             Matrix Object:<br>

               type=seqsbaij,

rows=101164, cols=101164<br>

               total: nonzeros=1378558,

allocated nonzeros=1378558<br>

                   block size

is 1<br>

       linear system matrix = precond matrix:<br>

       Matrix Object:<br>

         type=seqaij, rows=101164, cols=101164<br>

         total: nonzeros=2655952, allocated

nonzeros=5159364<br>

           not using I-node routines<br>

     linear system matrix = precond matrix:<br>

     Matrix Object:<br>

       type=mpiaij, rows=411866, cols=411866<br>

       total: nonzeros=10941434, allocated

nonzeros=42010332<br>

         not using I-node (on process 0) routines<br>

 Up solver (post-smoother) on level 2 -------------------------------<br>

   KSP Object:(mg_levels_2_)<br>

     type: richardson<br>

       Richardson: damping factor=0.9<br>

     maximum iterations=1<br>

     tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

     left preconditioning<br>

   PC Object:(mg_levels_2_)<br>

     type: bjacobi<br>

       block Jacobi: number of blocks = 4<br>

       Local solve is same for all blocks, in the following

KSP and PC<br>

objects:<br>

     KSP Object:(mg_levels_2_sub_)<br>

       type: preonly<br>

       maximum iterations=10000, initial guess is zero<br>

       tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

       left preconditioning<br>

     PC Object:(mg_levels_2_sub_)<br>

       type: icc<br>

         ICC: 0 levels of fill<br>

         ICC: factor fill ratio allocated 1<br>

         ICC: using Manteuffel shift<br>

         ICC: factor fill ratio needed 0.519045<br>

              Factored matrix follows<br>

             Matrix Object:<br>

               type=seqsbaij,

rows=101164, cols=101164<br>

               total: nonzeros=1378558,

allocated nonzeros=1378558<br>

                   block size

is 1<br>

       linear system matrix = precond matrix:<br>

       Matrix Object:<br>

         type=seqaij, rows=101164, cols=101164<br>

         total: nonzeros=2655952, allocated

nonzeros=5159364<br>

           not using I-node routines<br>

     linear system matrix = precond matrix:<br>

     Matrix Object:<br>

       type=mpiaij, rows=411866, cols=411866<br>

       total: nonzeros=10941434, allocated

nonzeros=42010332<br>

         not using I-node (on process 0) routines<br>

 linear system matrix = precond matrix:<br>

 Matrix Object:<br>

   type=mpiaij, rows=411866, cols=411866<br>

   total: nonzeros=10941434, allocated nonzeros=42010332<br>

     not using I-node (on process 0) routines<br>

<br>

%%%%%%%%%%%%%%%%%%%%%%<br>

AMG w/ SOR x4 ksp_view<br>

%%%%%%%%%%%%%%%%%%%%%%<br>

<br>

KSP Object:<br>

 type: cg<br>

 maximum iterations=10000<br>

 tolerances:  relative=1e-05, absolute=1e-50, divergence=10000<br>

 left preconditioning<br>

PC Object:<br>

 type: ml<br>

   MG: type is MULTIPLICATIVE, levels=3 cycles=v, pre-smooths=1,<br>

post-smooths=1<br>

 Coarse gride solver -- level 0 -------------------------------<br>

   KSP Object:(mg_coarse_)<br>

     type: preonly<br>

     maximum iterations=1, initial guess is zero<br>

     tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

     left preconditioning<br>

   PC Object:(mg_coarse_)<br>

     type: redundant<br>

       Redundant preconditioner: First (color=0) of 4 PCs

follows<br>

     KSP Object:(mg_coarse_redundant_)<br>

       type: preonly<br>

       maximum iterations=10000, initial guess is zero<br>

       tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

       left preconditioning<br>

     PC Object:(mg_coarse_redundant_)<br>

       type: lu<br>

         LU: out-of-place factorization<br>

           matrix ordering: nd<br>

         LU: tolerance for zero pivot 1e-12<br>

         LU: factor fill ratio needed 2.17227<br>

              Factored matrix follows<br>

             Matrix Object:<br>

               type=seqaij, rows=283,

cols=283<br>

               total: nonzeros=21651,

allocated nonzeros=21651<br>

                 using I-node

routines: found 186 nodes, limit used is<br>

5<br>

       linear system matrix = precond matrix:<br>

       Matrix Object:<br>

         type=seqaij, rows=283, cols=283<br>

         total: nonzeros=9967, allocated nonzeros=14150<br>

           not using I-node routines<br>

     linear system matrix = precond matrix:<br>

     Matrix Object:<br>

       type=mpiaij, rows=283, cols=283<br>

       total: nonzeros=9967, allocated nonzeros=9967<br>

         not using I-node (on process 0) routines<br>

 Down solver (pre-smoother) on level 1 -------------------------------<br>

   KSP Object:(mg_levels_1_)<br>

     type: richardson<br>

       Richardson: damping factor=1<br>

     maximum iterations=4, initial guess is zero<br>

     tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

     left preconditioning<br>

   PC Object:(mg_levels_1_)<br>

     type: sor<br>

       SOR: type = local_symmetric, iterations = 1, omega =

1<br>

     linear system matrix = precond matrix:<br>

     Matrix Object:<br>

       type=mpiaij, rows=10654, cols=10654<br>

       total: nonzeros=376634, allocated nonzeros=376634<br>

         not using I-node (on process 0) routines<br>

 Up solver (post-smoother) on level 1 -------------------------------<br>

   KSP Object:(mg_levels_1_)<br>

     type: richardson<br>

       Richardson: damping factor=1<br>

     maximum iterations=4<br>

     tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

     left preconditioning<br>

   PC Object:(mg_levels_1_)<br>

     type: sor<br>

       SOR: type = local_symmetric, iterations = 1, omega =

1<br>

     linear system matrix = precond matrix:<br>

     Matrix Object:<br>

       type=mpiaij, rows=10654, cols=10654<br>

       total: nonzeros=376634, allocated nonzeros=376634<br>

         not using I-node (on process 0) routines<br>

 Down solver (pre-smoother) on level 2 -------------------------------<br>

   KSP Object:(mg_levels_2_)<br>

     type: richardson<br>

       Richardson: damping factor=1<br>

     maximum iterations=4, initial guess is zero<br>

     tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

     left preconditioning<br>

   PC Object:(mg_levels_2_)<br>

     type: sor<br>

       SOR: type = local_symmetric, iterations = 1, omega =

1<br>

     linear system matrix = precond matrix:<br>

     Matrix Object:<br>

       type=mpiaij, rows=411866, cols=411866<br>

       total: nonzeros=10941434, allocated

nonzeros=42010332<br>

         not using I-node (on process 0) routines<br>

 Up solver (post-smoother) on level 2 -------------------------------<br>

   KSP Object:(mg_levels_2_)<br>

     type: richardson<br>

       Richardson: damping factor=1<br>

     maximum iterations=4<br>

     tolerances:  relative=1e-05, absolute=1e-50,

divergence=10000<br>

     left preconditioning<br>

   PC Object:(mg_levels_2_)<br>

     type: sor<br>

       SOR: type = local_symmetric, iterations = 1, omega =

1<br>

     linear system matrix = precond matrix:<br>

     Matrix Object:<br>

       type=mpiaij, rows=411866, cols=411866<br>

       total: nonzeros=10941434, allocated

nonzeros=42010332<br>

         not using I-node (on process 0) routines<br>

 linear system matrix = precond matrix:<br>

 Matrix Object:<br>

   type=mpiaij, rows=411866, cols=411866<br>

   total: nonzeros=10941434, allocated nonzeros=42010332<br>

     not using I-node (on process 0) routines<br>

<br>

</p>


</blockquote>


</div>


<p><br>

<br clear="all">

<br>

-- <br>

What most experimenters take for granted before they begin their experiments is

infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener</p>


</div>


</div>


</blockquote></div><br><br clear="all"><br>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener<br>