On Mon, Jul 4, 2011 at 7:32 PM, Haren, S.W. van (Steven) <span dir="ltr"><<a href="mailto:vanharen@nrg.eu">vanharen@nrg.eu</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Thank you for you reply Jed.<br>
<br>
I will take a look at the preconditioners, to see if I can increase the scaling.<br>
<br>
CPU is an Intel i7 q720, just a standard laptop CPU.<br></blockquote><div><br></div><div>As Jed points out, you will see very little speedup here due to the quite poor memory subsystem. Intel</div><div>rarely points out that this setup is great for factoring, but lousy for large swaths of computational science:</div>
<div><br></div><div> <a href="http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers">http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers</a></div><div><br></div><div> Matt</div><div>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Regards,<br>
<br>
Steven<br>
<br>
<br>
<br>
---------------------------<br>
Date: Mon, 4 Jul 2011 12:24:56 -0500<br>
From: Jed Brown <<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>><br>
Subject: Re: [petsc-users] Increasing parallel speed-up<br>
To: PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>><br>
Message-ID:<br>
<<a href="mailto:CAM9tzSnBAmOdwxKEz-_BA9o%2BSvQHP69eEE50ozQ3LVFor0eSBQ@mail.gmail.com">CAM9tzSnBAmOdwxKEz-_BA9o+SvQHP69eEE50ozQ3LVFor0eSBQ@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<div><div></div><div class="h5"><br>
On Mon, Jul 4, 2011 at 12:09, Haren, S.W. van (Steven) <<a href="mailto:vanharen@nrg.eu">vanharen@nrg.eu</a>>wrote:<br>
<br>
> one of the ksp solvers (Conjugate Gradient method with ILU(0)<br>
> preconditioning) gives poor parallel performance for the<br>
><br>
<br>
We need to identify how much the poor scaling is due to the preconditioner<br>
changing (e.g. block Jacobi with ILU(0)) such that more iterations are<br>
needed versus memory bandwidth. Run with -ksp_monitor or<br>
-ksp_converged_reason to see the iterations. You can try -pc_type asm (or<br>
algebraic multigrid using third-party libraries) to improve the iteration<br>
count.<br>
<br>
If you want help seeing what's going on, send -log_summary output for each<br>
case.<br>
<br>
<br>
> following settings:<br>
><br>
> - number of unknowns ~ 2 million<br>
> - 1, 2 and 4 processors (quad core CPU)<br>
><br>
<br>
What kind? In particular, what memory bus and how many channels? Sparse<br>
matrix kernels are overwhelmingly limited by memory performance, so extra<br>
cores do very little good unless the memory system is very good (or the<br>
matrix fits in cache).<br>
<br>
<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>