I've had the issue both in a SuSe Linux 9 and a Ubuntu Sever 8.04 both using Kernel 2.6. But I don't get what you want with your experiment, since you ask me to run non-MPI applications (I don't know which ones) when the problem is intrinsically linked with MPI. Also the problem is occurring in MPICH v1.2.7p1.<br>
<br><div class="gmail_quote">On Tue, Jan 13, 2009 at 7:50 PM, chong tan <span dir="ltr"><<a href="mailto:chong_guan_tan@yahoo.com">chong_guan_tan@yahoo.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<table cellspacing="0" cellpadding="0" border="0"><tbody><tr><td valign="top" style="font:inherit"><div><br>Since I don't know how parallalism is explored in your application, I have to assume that inter-process communication is not a crtitical factor (based on your run result).</div>
<div> </div>
<div>Back to HW. 1 CPU with 2 cores !== 2CPU with 1 core in terms of performance. </div>
<div>With 2 boxes, your aggreated memory bandwidth is larger. Cache bandwidth can also be better on 2 boxes. Therefore you maybe get better throughput frrom 2 boxes of 2X2 by running 1 job per CPU. The older version of Linux (I assume you have this) may not be able to distribute job evenly per physical CPU, the later ones do a reasonably good job on this, hhowever.</div>
<div> </div>
<div>As a way to convince yourselves, would you mind try the following experiment :</div>
<div>1. use taskset to run 2 copy of your non-MPI application, first on the same CPU</div>
<div>2. repeat the last step, but on different physical CPUs</div>
<div> </div>
<div>if you want to see how the total available resource is affecting perforamnce, do :</div>
<div>3. run 4 copies of your non MPI applicatino on one box.</div>
<div> </div>
<div>hopefully, the perforamnce you derived can tell you something. Otherwise, MPICH2 has have some serious bug.</div>
<div> </div>
<div>tan</div>
<div> </div>
<div> </div>
<div> </div>
<div>--- On <b>Tue, 1/13/09, Marcus Vinicius Brandão Soares <i><<a href="mailto:mvbsoares@gmail.com" target="_blank">mvbsoares@gmail.com</a>></i></b> wrote:<br></div>
<blockquote style="padding-left:5px;margin-left:5px;border-left:rgb(16,16,255) 2px solid">From: Marcus Vinicius Brandão Soares <<a href="mailto:mvbsoares@gmail.com" target="_blank">mvbsoares@gmail.com</a>><div class="Ih2E3d">
<br>Subject: Re: [mpich-discuss] MPICH v1.2.7p1 and SMP clusters<br>To: <a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br></div>Date: Tuesday, January 13, 2009, 2:04 PM<div><div></div>
<div class="Wj3C7c"><br><br>
<div>Hello Gustavo,<br><br>My point is: depending on the structure of the connections (the edges of the graph) that are used interconnecting the processors and the type of processsing (cpu intensive or communication intensive) and, even, the level of paralelism (much sequential to divide-and-conquer) it may be easy to explain the differences you made reference to. <br>
<br>Do you agree ?<br><br>
<div class="gmail_quote">Best regards,<br><br><br><br>2009/1/13 Gustavo Miranda Teixeira <span dir="ltr"><<a href="mailto:magusbr@gmail.com" rel="nofollow" target="_blank">magusbr@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="padding-left:1ex;margin:0pt 0pt 0pt 0.8ex;border-left:rgb(204,204,204) 1px solid">Hello Marcus,<br><br>I can't see what your point is. Do you want to know in which core is the processes allocated? Like if it's on the same processor or different ones?
<div>
<div></div>
<div><br><br><br>
<div class="gmail_quote">On Tue, Jan 13, 2009 at 3:06 PM, Marcus Vinicius Brandão Soares <span dir="ltr"><<a href="mailto:mvbsoares@gmail.com" rel="nofollow" target="_blank">mvbsoares@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="padding-left:1ex;margin:0pt 0pt 0pt 0.8ex;border-left:rgb(204,204,204) 1px solid">Hello Gustavo and all,<br><br>You described that you are using two machines with a dual processor in each one. If I can model it in a simple graph, we have two vertices and two unidirectional edges.<br>
<br>Each machine has a dual processor, each one with dual core, so there are 8 processor. But lets think again in the graph model: now we have two vertices, each one with two more vertices; these last two vertices have two more vertices too, and so this is the end.<br>
<br>Do you know the structure of the communication lines of the core processors ? <br><br>
<div class="gmail_quote">2009/1/13 Gustavo Miranda Teixeira <span dir="ltr"><<a href="mailto:magusbr@gmail.com" rel="nofollow" target="_blank">magusbr@gmail.com</a>></span>
<div>
<div></div>
<div><br>
<blockquote class="gmail_quote" style="padding-left:1ex;margin:0pt 0pt 0pt 0.8ex;border-left:rgb(204,204,204) 1px solid">Hello everyone!<br><br>I've been experiencing some issues when using MPICH v1.2.7p1 and a SMP cluster and thought maybe some one can help me here.<br>
<br>I have a small cluster with two dual processor machines with gigabit ethernet communication. Each processor is a dual core which sums up to 8 cores of processors. When I run an application spreading 4 processes in both the machines (like distributing 2 processes in one machine and 2 processes in another) I get a significantly better performance than when I run the same application using 4 processes in only one machine. Isn`t it a bit curious? I know some people who also noticed that, but no one can explain me why this happens. Googling it didn't helped either. I originally thought it was a problem from my kind of application (a heart simulator which using PETSc to solve some
differential equations) but some simple experimentations showed a simple MPI_Send inside a huge loop causes the same issue. Measuring cache hits and misses showed it`s not a memory contention problem. I also know that a in-node communication in MPICH uses the loopback interface, but as far as I know a message that uses loopback interface simply takes a shortcut to the input queue instead of being sent to the device, so there is no reason for the message to take longer to get to the other processes. So, I have no idea why it`s taking longer to use MPICH in the same machine. Does anyone else have noticed that too? Is there some logical explanation for this to happen?<br>
<br>Thanks,<br><font color="#888888">Gustavo Miranda Teixeira<br></font></blockquote></div></div></div><br><br clear="all"><br>-- <br>Marcus Vinicius<br>--<br>"Havendo suficientes colaboradores,<br>Qualquer problema é passível de solução"<br>
Eric S. Raymond<br>A Catedral e o
Bazar<br><br>"O passado é apenas um recurso para o presente"<br>Clave de Clau<br><br>"Ninguém é tão pobre que não possa dar um abraço; e <br>Ninguém é tão rico que não necessite de um abraço.<br>Anônimo<br>
</blockquote></div><br></div></div></blockquote></div><br><br clear="all"><br>-- <br>Marcus Vinicius<br>--<br>"Havendo suficientes colaboradores,<br>Qualquer problema é passível de solução"<br>Eric S. Raymond<br>
A Catedral e o Bazar<br><br>"O passado é apenas um recurso para o presente"<br>Clave de Clau<br><br>"Ninguém é tão pobre que não possa dar um abraço; e <br>Ninguém é tão rico que não necessite de um abraço.<br>
Anônimo<br></div></div></div></blockquote></td></tr></tbody></table><br>
</blockquote></div><br>