I totally agree. But as I said in the first mail, the same problem occurs in my real application (a cpu intensive) as it occurs in a simple communication intensive (i.e. a simple MPI_Send inside a loop). Also you can vary the number of processes per machine and the same will happen every time there is a machine with more processes than the other. So it&#39;s independent of the structure of connections and the level of parallelism.<br>

<br><div class="gmail_quote">On Tue, Jan 13, 2009 at 7:04 PM, Marcus Vinicius Brandão Soares <span dir="ltr">&lt;<a href="mailto:mvbsoares@gmail.com">mvbsoares@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Hello Gustavo,<br><br>My point is: depending on the structure of the connections (the edges of the graph) that are used interconnecting the processors and the type of processsing (cpu intensive or communication intensive) and, even, the level of paralelism (much sequential to divide-and-conquer) it may be easy to explain the differences you made reference to. <br>


<br>Do you agree ?<br><br><div class="gmail_quote">Best regards,<div><div></div><div class="Wj3C7c"><br><br><br><br>2009/1/13 Gustavo Miranda Teixeira <span dir="ltr">&lt;<a href="mailto:magusbr@gmail.com" target="_blank">magusbr@gmail.com</a>&gt;</span><br>

<blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">

Hello Marcus,<br><br>I can&#39;t see what your point is. Do you want to know in which core is the processes allocated? Like if it&#39;s on the same processor or different ones?<div><div></div><div><br><br><br>

<div class="gmail_quote">On Tue, Jan 13, 2009 at 3:06 PM, Marcus Vinicius Brandão Soares <span dir="ltr">&lt;<a href="mailto:mvbsoares@gmail.com" target="_blank">mvbsoares@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">Hello Gustavo and all,<br><br>You described that you are using two machines with a dual processor in each one. If I can model it in a simple graph, we have two vertices and two unidirectional edges.<br>


<br>Each machine has a dual processor, each one with dual core, so there are 8 processor. But lets think again in the graph model: now we have two vertices, each one with two more vertices; these last two vertices have two more vertices too, and so this is the end.<br>


<br>Do you know the structure of the communication lines of the core processors ? <br><br><div class="gmail_quote">2009/1/13 Gustavo Miranda Teixeira <span dir="ltr">&lt;<a href="mailto:magusbr@gmail.com" target="_blank">magusbr@gmail.com</a>&gt;</span><div>


<div></div><div><br>

<blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">Hello everyone!<br><br>I&#39;ve been experiencing some issues when using MPICH v1.2.7p1 and a SMP cluster and thought maybe some one can help me here.<br>


<br>I have a small cluster with two dual processor  machines with gigabit ethernet communication. Each processor is a dual core which sums up to 8 cores of processors. When I run an application spreading 4 processes in both the machines  (like distributing 2 processes in one machine and 2 processes in another) I get a significantly better performance than when I run the same application using 4 processes in only one machine. Isn`t it a bit curious? I know some people who also noticed that, but no one can explain me why this happens. Googling it didn&#39;t helped either. I originally thought it was a problem from my kind of application (a heart simulator which using PETSc to solve some differential equations) but some simple experimentations showed a simple MPI_Send inside a huge loop causes the same issue. Measuring cache hits and misses showed it`s not a memory contention problem. I also know that a in-node communication in MPICH uses the loopback interface, but as far as I know a message that uses loopback interface simply takes a shortcut to the input queue instead of being sent to the device, so there is no reason for the message to take longer to get to the other processes. So, I have no idea why it`s taking longer to use MPICH in the same machine. Does anyone else have noticed that too? Is there some logical explanation for this to happen?<br>


<br>Thanks,<br><font color="#888888">Gustavo Miranda Teixeira<br>

</font></blockquote></div></div></div><br><br clear="all"><br>-- <br>Marcus Vinicius<br>--<br>&quot;Havendo suficientes colaboradores,<br>Qualquer problema é passível de solução&quot;<br>Eric S. Raymond<br>A Catedral e o Bazar<br>


<br>

&quot;O passado é apenas um recurso para o presente&quot;<br>Clave de Clau<br><br>&quot;Ninguém é tão pobre que não possa dar um abraço; e <br>Ninguém é tão rico que não necessite de um abraço.<br>Anônimo<br>

</blockquote></div><br>

</div></div></blockquote></div></div></div><div><div></div><div class="Wj3C7c"><br><br clear="all"><br>-- <br>Marcus Vinicius<br>--<br>&quot;Havendo suficientes colaboradores,<br>Qualquer problema é passível de solução&quot;<br>

Eric S. Raymond<br>A Catedral e o Bazar<br>

<br>&quot;O passado é apenas um recurso para o presente&quot;<br>Clave de Clau<br><br>&quot;Ninguém é tão pobre que não possa dar um abraço; e <br>Ninguém é tão rico que não necessite de um abraço.<br>Anônimo<br>

</div></div></blockquote></div><br>