Hello Gustavo and all,<br><br>You described that you are using two machines with a dual processor in each one. If I can model it in a simple graph, we have two vertices and two unidirectional edges.<br><br>Each machine has a dual processor, each one with dual core, so there are 8 processor. But lets think again in the graph model: now we have two vertices, each one with two more vertices; these last two vertices have two more vertices too, and so this is the end.<br>

<br>Do you know the structure of the communication lines of the core processors ? <br><br><div class="gmail_quote">2009/1/13 Gustavo Miranda Teixeira <span dir="ltr">&lt;<a href="mailto:magusbr@gmail.com">magusbr@gmail.com</a>&gt;</span><br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hello everyone!<br><br>I&#39;ve been experiencing some issues when using MPICH v1.2.7p1 and a SMP cluster and thought maybe some one can help me here.<br>

<br>I have a small cluster with two dual processor  machines with gigabit ethernet communication. Each processor is a dual core which sums up to 8 cores of processors. When I run an application spreading 4 processes in both the machines  (like distributing 2 processes in one machine and 2 processes in another) I get a significantly better performance than when I run the same application using 4 processes in only one machine. Isn`t it a bit curious? I know some people who also noticed that, but no one can explain me why this happens. Googling it didn&#39;t helped either. I originally thought it was a problem from my kind of application (a heart simulator which using PETSc to solve some differential equations) but some simple experimentations showed a simple MPI_Send inside a huge loop causes the same issue. Measuring cache hits and misses showed it`s not a memory contention problem. I also know that a in-node communication in MPICH uses the loopback interface, but as far as I know a message that uses loopback interface simply takes a shortcut to the input queue instead of being sent to the device, so there is no reason for the message to take longer to get to the other processes. So, I have no idea why it`s taking longer to use MPICH in the same machine. Does anyone else have noticed that too? Is there some logical explanation for this to happen?<br>


<br>Thanks,<br><font color="#888888">Gustavo Miranda Teixeira<br>

</font></blockquote></div><br><br clear="all"><br>-- <br>Marcus Vinicius<br>--<br>&quot;Havendo suficientes colaboradores,<br>Qualquer problema é passível de solução&quot;<br>Eric S. Raymond<br>A Catedral e o Bazar<br><br>

&quot;O passado é apenas um recurso para o presente&quot;<br>Clave de Clau<br><br>&quot;Ninguém é tão pobre que não possa dar um abraço; e <br>Ninguém é tão rico que não necessite de um abraço.<br>Anônimo<br>