[mpich-discuss] Why is my quad core slower than cluster

Gus Correa gus at ldeo.columbia.edu
Mon Jul 14 12:59:36 CDT 2008


Hello Sami and list

Sami, here are my two cents
regarding your question on how to ensure that cores on separate physical 
processors
are used on mpich jobs.

First some caveats.
This is based only on observations I made when my jobs were running.
I am not an expert, just an user, so please don't quote me on this.
However, I share your perplexity and disappointment with multicore 
machine performance.
I am not knowledgeable on the Linux scheduler either, which is likely to 
be the part of the OS
that determines where each process runs.
Maybe somebody more knowledgeable of the Linux scheduler could clarify 
this matter.

My experience here is limited to dual-core dual-processor machines.
A Fedora Core 8, AMD Opteron, running kernel 2.6.23.9-85.fc8 SMP,
and a Fedora Core 8, Intel Xeon, running kernel 2.6.24.5-85.fc8 SMP.
MPICH on my tests was mpich2 1.0.6, not the latest 1.0.7.

My observations suggest that the kernel scheduler chooses the two 
physical processors,
not two cores on the same physical processor,
whenever I launch mpich programs with mpiexec -n 2.
Also, this is when I only have one instance of the program running, on 
an otherwise idle machine. 
The scheduler will use the additional cores only if I launch another job,
or if I request more than two cores for that job (say, mpiexec -n 4),
or if the machine is busy with other programs running, I guess.

At least, that's what I see on "top" (with field option "j"= turned on).
I.e. when I launch the program with "mpiexec -n 2", "top" shows it 
running on
processors 0 and 2, which I think are the first cores on two different 
physical processors.
"Processors" 1 and 3 (which I think are the additional cores) are busy 
with other tasks, mostly with the kernel itself.
"Processors" 1 and 3 only kick in if my mpich job asks for more than 2 
processors, or if I launch more than one job.

Isn't this what you see in your machine?

Thank you,
Gus Correa

-- 
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


H. Sami Sozuer wrote:

> We have a quad cpu opteron system, and each cpu has 2 cores for a 
> total of 8 cores.
> When I run an mpich job on up to 4 processors, everything is as 
> expected and I get
> a decrease in turnaround time. The scaling is not linear, but still 
> there is a speedup of about
> a factor of 2.5 with 4 nodes. But when I increase the number of nodes 
> to 5-8, I actually get a slowdown,
> even though there are in fact a total of 8 cores.
>
> My interpretation of this is that each Opteron has an independent bus 
> to its own memory,
> so when you use 4 nodes in mpich, each processor is still using its 
> own independent bus.
> But when running on 8 nodes, now the two cores in each cpu are 
> competing for memory access
> on the same bus, resulting in an overall slowdown.
>
> BTW, when running on 4 nodes in multicore systems, is there any way to 
> guarantee that mpich
> uses the cores on physically different processors, rather than using 
> the two cores on the same CPU?
> It seems to me SMP architecture implies that each core is treated as 
> an independent processor
> by the OS. If my understanding is correct, then the jobs may be 
> assigned two two cores on the
> same processor while both cores of another CPU remain idle, resulting 
> in loss of efficiency due
> to competition for  memory access of the two cores on the same CPU.
>
> Any thoughts ?
>
> Sami
>
> Matthew Bettencourt wrote:
>
>>
>> We have the same issue, the issue is memory bandwidth for us.  We 
>> can't utilize the extra processing power that tht multicore provides 
>> because we can't keep those cores fed.
>> M
>>
>>
>> zach wrote:
>>
>>> Following up on these suggestions and info queries...
>>> (Thanks for the help!)
>>>
>>> I noticed my home pc processor is a
>>> Core2 Quad CPU Q6600  @ 2.40GHz (Kentsfield)
>>> whereas the cluster Xeon is 3.20GHz.
>>> I don't think this is causing the degree of 3 in speed.
>>>
>>> compiler is gcc on home pc and cluster.
>>>
>>> same optimization option for both systems
>>> yes cpuinfo and meminfo show the right #cpus and mem
>>>
>>> mpich is different versions i have discovered.
>>>
>>> on cluster (faster one),
>>> MPICH Version:        1.2.7p1
>>> MPICH Release date:    $Date: 2005/11/04 11:54:51$
>>> MPICH Patches applied:    none
>>> MPICH configure:     --prefix=/opt/mpich/intel --enable-sharedlib
>>> --with-romio --enable-f90modules -c++=icpc -cc=icc -fc=ifort
>>> -f90=ifort
>>> MPICH Device:        ch_p4
>>>
>>> on home pc (the slug),
>>> MPICH2 Version:        1.0.7
>>> MPICH2 Release date:    Unknown, built on Tue Jul  8 19:28:07 CDT 2008
>>> MPICH2 Device:        ch3:nemesis
>>> MPICH2 configure:     --prefix=/home/code/mpich 
>>> --with-device=ch3:nemesis
>>> MPICH2 CC:     gcc  -O2
>>> MPICH2 CXX:     c++  -O2
>>> MPICH2 F77:    MPICH2 F90:   
>>> on the cluster I have been compiling with mpiCC and on the home pc 
>>> with mpicxx.
>>>
>>> kernel on home pc:
>>> Linux myPC 2.6.24-18-generic #1 SMP Wed May 28 19:28:38 UTC 2008
>>> x86_64 GNU/Linux
>>>
>>> I am using ubuntu hardy and did not use 'sudo' during installation of
>>> mpich2 (not logged in as superuser) -don't know if this matters.
>>>
>>> zach
>>>
>>>
>




More information about the mpich-discuss mailing list