On Mon, Jul 4, 2011 at 7:32 PM, Haren, S.W. van (Steven) <span dir="ltr">&lt;<a href="mailto:vanharen@nrg.eu">vanharen@nrg.eu</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Thank you for you reply Jed.<br>

<br>

I will take a look at the preconditioners, to see if I can increase the scaling.<br>

<br>

CPU is an Intel i7 q720, just a standard laptop CPU.<br></blockquote><div><br></div><div>As Jed points out, you will see very little speedup here due to the quite poor memory subsystem. Intel</div><div>rarely points out that this setup is great for factoring, but lousy for large swaths of computational science:</div>

<div><br></div><div>  <a href="http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers">http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers</a></div><div><br></div><div>     Matt</div><div>

 </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Regards,<br>

<br>

Steven<br>

<br>

<br>

<br>

---------------------------<br>

Date: Mon, 4 Jul 2011 12:24:56 -0500<br>

From: Jed Brown &lt;<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>&gt;<br>

Subject: Re: [petsc-users] Increasing parallel speed-up<br>

To: PETSc users list &lt;<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>&gt;<br>

Message-ID:<br>

        &lt;<a href="mailto:CAM9tzSnBAmOdwxKEz-_BA9o%2BSvQHP69eEE50ozQ3LVFor0eSBQ@mail.gmail.com">CAM9tzSnBAmOdwxKEz-_BA9o+SvQHP69eEE50ozQ3LVFor0eSBQ@mail.gmail.com</a>&gt;<br>

Content-Type: text/plain; charset=&quot;utf-8&quot;<br>

<div><div></div><div class="h5"><br>

On Mon, Jul 4, 2011 at 12:09, Haren, S.W. van (Steven) &lt;<a href="mailto:vanharen@nrg.eu">vanharen@nrg.eu</a>&gt;wrote:<br>

<br>

&gt; one of the ksp solvers (Conjugate Gradient method with ILU(0)<br>

&gt; preconditioning) gives poor parallel performance for the<br>

&gt;<br>

<br>

We need to identify how much the poor scaling is due to the preconditioner<br>

changing (e.g. block Jacobi with ILU(0)) such that more iterations are<br>

needed versus memory bandwidth. Run with -ksp_monitor or<br>

-ksp_converged_reason to see the iterations. You can try -pc_type asm (or<br>

algebraic multigrid using third-party libraries) to improve the iteration<br>

count.<br>

<br>

If you want help seeing what&#39;s going on, send -log_summary output for each<br>

case.<br>

<br>

<br>

&gt; following settings:<br>

&gt;<br>

&gt; - number of unknowns ~ 2 million<br>

&gt; - 1, 2 and 4 processors (quad core CPU)<br>

&gt;<br>

<br>

What kind? In particular, what memory bus and how many channels? Sparse<br>

matrix kernels are overwhelmingly limited by memory performance, so extra<br>

cores do very little good unless the memory system is very good (or the<br>

matrix fits in cache).<br>

<br>

<br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>