[mpich-discuss] MPICH v1.2.7p1 and SMP clusters

Tue Jan 13 18:09:05 CST 2009

I've had the issue both in a SuSe Linux 9 and a Ubuntu Sever 8.04 both using
Kernel 2.6. But I don't get what you want with your experiment, since you
ask me to run non-MPI applications (I don't know which ones) when the
problem is intrinsically linked with MPI. Also the problem is occurring in
MPICH v1.2.7p1.

On Tue, Jan 13, 2009 at 7:50 PM, chong tan <chong_guan_tan at yahoo.com> wrote:

>
> Since I don't know how parallalism is explored in your application, I have
> to assume that inter-process communication is not a crtitical factor (based
> on your run result).
>
> Back to HW.  1 CPU with 2 cores !== 2CPU with 1 core in terms of
> performance.
> With 2 boxes, your aggreated memory bandwidth is larger.  Cache
> bandwidth can also be better on 2 boxes. Therefore you maybe get better
> throughput frrom 2 boxes of 2X2 by running 1 job per CPU.  The older version
> of Linux (I assume you have this) may not be able to distribute job evenly
> per physical CPU, the later ones do a reasonably good job on this, hhowever.
>
> As a way to convince yourselves, would you mind try the following
> experiment :
> 1.  use taskset to run 2 copy of your non-MPI application, first on the
> same CPU
> 2. repeat the last step, but on different physical CPUs
>
> if you want to see how the total available resource is affecting
> perforamnce, do :
> 3.  run 4 copies of your non MPI applicatino on one box.
>
> hopefully, the perforamnce you derived can tell you something.  Otherwise,
> MPICH2 has have some serious bug.
>
> tan
>
>
>
> --- On *Tue, 1/13/09, Marcus Vinicius Brandão Soares <mvbsoares at gmail.com>
> * wrote:
>
> From: Marcus Vinicius Brandão Soares <mvbsoares at gmail.com>
> Subject: Re: [mpich-discuss] MPICH v1.2.7p1 and SMP clusters
> To: mpich-discuss at mcs.anl.gov
> Date: Tuesday, January 13, 2009, 2:04 PM
>
>
> Hello Gustavo,
>
> My point is: depending on the structure of the connections (the edges of
> the graph) that are used interconnecting the processors and the type of
> processsing (cpu intensive or communication intensive) and, even, the level
> of paralelism (much sequential to divide-and-conquer) it may be easy to
> explain the differences you made reference to.
>
> Do you agree ?
>
> Best regards,
>
>
>
> 2009/1/13 Gustavo Miranda Teixeira <magusbr at gmail.com>
>
>> Hello Marcus,
>>
>> I can't see what your point is. Do you want to know in which core is the
>> processes allocated? Like if it's on the same processor or different ones?
>>
>>
>>
>> On Tue, Jan 13, 2009 at 3:06 PM, Marcus Vinicius Brandão Soares <
>> mvbsoares at gmail.com> wrote:
>>
>>> Hello Gustavo and all,
>>>
>>> You described that you are using two machines with a dual processor in
>>> each one. If I can model it in a simple graph, we have two vertices and two
>>> unidirectional edges.
>>>
>>> Each machine has a dual processor, each one with dual core, so there are
>>> 8 processor. But lets think again in the graph model: now we have two
>>> vertices, each one with two more vertices; these last two vertices have two
>>> more vertices too, and so this is the end.
>>>
>>> Do you know the structure of the communication lines of the core
>>> processors ?
>>>
>>> 2009/1/13 Gustavo Miranda Teixeira <magusbr at gmail.com>
>>>
>>> Hello everyone!
>>>>
>>>> I've been experiencing some issues when using MPICH v1.2.7p1 and a SMP
>>>> cluster and thought maybe some one can help me here.
>>>>
>>>> I have a small cluster with two dual processor machines with gigabit
>>>> ethernet communication. Each processor is a dual core which sums up to 8
>>>> cores of processors. When I run an application spreading 4 processes in both
>>>> the machines (like distributing 2 processes in one machine and 2 processes
>>>> in another) I get a significantly better performance than when I run the
>>>> same application using 4 processes in only one machine. Isn`t it a bit
>>>> curious? I know some people who also noticed that, but no one can explain me
>>>> why this happens. Googling it didn't helped either. I originally thought it
>>>> was a problem from my kind of application (a heart simulator which using
>>>> PETSc to solve some differential equations) but some simple experimentations
>>>> showed a simple MPI_Send inside a huge loop causes the same issue. Measuring
>>>> cache hits and misses showed it`s not a memory contention problem. I also
>>>> know that a in-node communication in MPICH uses the loopback interface, but
>>>> as far as I know a message that uses loopback interface simply takes a
>>>> shortcut to the input queue instead of being sent to the device, so there is
>>>> no reason for the message to take longer to get to the other processes. So,
>>>> I have no idea why it`s taking longer to use MPICH in the same machine. Does
>>>> anyone else have noticed that too? Is there some logical explanation for
>>>> this to happen?
>>>>
>>>> Thanks,
>>>> Gustavo Miranda Teixeira
>>>>
>>>
>>>
>>>
>>> --
>>> Marcus Vinicius
>>> --
>>> "Havendo suficientes colaboradores,
>>> Qualquer problema é passível de solução"
>>> Eric S. Raymond
>>> A Catedral e o Bazar
>>>
>>> "O passado é apenas um recurso para o presente"
>>> Clave de Clau
>>>
>>> "Ninguém é tão pobre que não possa dar um abraço; e
>>> Ninguém é tão rico que não necessite de um abraço.
>>> Anônimo
>>>
>>
>>
>
>
> --
> Marcus Vinicius
> --
> "Havendo suficientes colaboradores,
> Qualquer problema é passível de solução"
> Eric S. Raymond
> A Catedral e o Bazar
>
> "O passado é apenas um recurso para o presente"
> Clave de Clau
>
> "Ninguém é tão pobre que não possa dar um abraço; e
> Ninguém é tão rico que não necessite de um abraço.
> Anônimo
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090113/d9b7c7bc/attachment.htm>