<div class="gmail_extra">On Tue, May 1, 2012 at 6:20 PM,  <span dir="ltr">&lt;<a href="mailto:tibo@berkeley.edu" target="_blank">tibo@berkeley.edu</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Dear petsc users,<br>

<br>

I am solving large systems of non linear PDEs. To do so, the most<br>

expensive operation is to solve linear systems Ax=b, where A is block<br>

tridiagonal. To do so, I am using petsc.<br>

<br>

A is created using MatCreateMPIAIJ and x and b using VecCreateMPI, and I<br>

do not modify default parameters of the KSP context for now (ie the Krylov<br>

method is GMRES and the preconditioner method is ILU(0) if I use 1<br>

processor - sequential matrix - and Block Jacobi with ILU(0) for each<br>

sub-block if I use more than 1 processor).<br>

<br>

For any number n of processors used, I do get the correct result. However,<br>

it seems that the more processors I have, the more iterations are done on<br>

each linear solve (n = 1 gives about 1-2 iterations per solve, n = 2 gives<br>

8-12 iterations per solve, n = 4 gives 15-20 iterations per solve).<br>

<br>

While I can understand the difference between n = 1 and n = 2, since the<br>

preconditioning method changes from ILU(0) to Block Jacobi, I don&#39;t<br>

understand why this is the case from n = 2 to 4 for example, since it<br>

seems to me that the method used to solve Ax=b will be the same (although<br>

the partitioning is different) and so the operations will be the same,<br>

even though there will be more communication.<br>

<br>

My first question is then: Is this normal behavior or am I probably wrong<br>

somewhere ?<br></blockquote><div><br></div><div>Its not the same from 2-4. You have 4 smaller blocks in BJACOBI and less coupling.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Also, since the increase in the number of iterations more than offsets the<br>

decrease in time spent solving the system when n increase, my program runs<br>

slower with an increasing number of processors, which is the opposite of<br>

what is desired...Would you have suggestions as what I could change to<br>

correct this ?<br></blockquote><div><br></div><div>Scalable preconditioning is a research topic. However, the next thing to try is AMG.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


I would be happy to provide more details about the problem/datastructures<br>

used if needed,<br></blockquote><div><br></div><div>The best thing to try usually is a literature search for people using scalable preconditioners</div><div>for your kind of physics.</div><div><br></div><div>    Matt</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

thank you for your help,<br>

<br>

Tibo<br>

<br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

</div>