[mpich-discuss] Why is my quad core slower than cluster

Gus Correa gus at ldeo.columbia.edu
Fri Jul 11 10:38:52 CDT 2008


Hello Zach and list

Zach, thank you for sending a more detailed summary of your 
configuration and setup.
Details help diagnosing right.

Garrick Staples, your USC cluster System Administrator, and your setup 
summary,
say the compilers are different and so are the optimization options.
The mpich versions are different as well.
In any case, this should account for some difference,
but I guess it cannot account for a factor of 3 in speed difference, 
which is very large,
as you pointed out.
What are the actual wall time numbers on both machines?
Milliseconds? Hours? Days?

I was hoping the pros, the MPICH experts and developers,
the kernel and system programming experts who subscribe to this list,
would clarify this matter in more detail.
They started a short but nice discussion where they wondered if the 
culprit was the Linux kernel,
then if it was the glibc memcpy() function.
This lifted my hopes of seeing some light.
However, to my distress, they stopped their rich discussion as suddenly 
as they started it.

Nevertheless, it would be great if this matter was discussed further,
as it is of interest to everybody who uses or intends to use multi-core 
machines,
and to the MPICH users in particular.
It has been often said that lack of memory bandwidth, i.e. a hardware 
architecture insufficiency,
is responsible for the poor response of current multi-core machines.
Because of this, email wars have been waged between the Xeon and the 
Opteron camps in many forums and lists,
but not much sound information about the problem was produced, as far as 
I know.

However, the suggestion made here that the Linux kernel and/or glibc 
memcpy()
may be playing a central role in the poor response of multi-core machines
deserves further discussion, clarification, and hopefully a fix.
Hey, pros, would you please shed some light here?

BTW, Zach, did you look at the size of your computation and Amdahl's law?
Too small problems don't can't scale well, because of the non-parallel 
tasks overhead.

Other than that, I confess I exhausted my options and knowledge on this 
matter.
All the diagnostics, problems, and fixes of MPICH on the dual-core machines
that we had here were posted on my messages on this thread.
Unfortunately they don't seem to address your problem.
Zach:  You have a cutting-edge processor,
and may be experiencing a new and different level of the multi-core 
performance problem.
Let's hope to hear from the experts about it.

Gus Correa

-- 
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

PS - Zach:  You are at USC, right?
Even in our days of gmail anonymity,
adding a signature block with your name, job, and affiliation
to your messages directed to public forums wouldn't hurt,
and perhaps would encourage your correspondents to answer your messages.  :)

zach wrote:

>Following up on these suggestions and info queries...
>(Thanks for the help!)
>
>I noticed my home pc processor is a
>Core2 Quad CPU Q6600  @ 2.40GHz (Kentsfield)
>whereas the cluster Xeon is 3.20GHz.
>I don't think this is causing the degree of 3 in speed.
>
>compiler is gcc on home pc and cluster.
>
>same optimization option for both systems
>yes cpuinfo and meminfo show the right #cpus and mem
>
>mpich is different versions i have discovered.
>
>on cluster (faster one),
>MPICH Version:    	1.2.7p1
>MPICH Release date:	$Date: 2005/11/04 11:54:51$
>MPICH Patches applied:	none
>MPICH configure: 	--prefix=/opt/mpich/intel --enable-sharedlib
>--with-romio --enable-f90modules -c++=icpc -cc=icc -fc=ifort
>-f90=ifort
>MPICH Device:    	ch_p4
>
>on home pc (the slug),
>MPICH2 Version:    	1.0.7
>MPICH2 Release date:	Unknown, built on Tue Jul  8 19:28:07 CDT 2008
>MPICH2 Device:    	ch3:nemesis
>MPICH2 configure: 	--prefix=/home/code/mpich --with-device=ch3:nemesis
>MPICH2 CC: 	gcc  -O2
>MPICH2 CXX: 	c++  -O2
>MPICH2 F77: 	
>MPICH2 F90: 	
>
>on the cluster I have been compiling with mpiCC and on the home pc with mpicxx.
>
>kernel on home pc:
>Linux myPC 2.6.24-18-generic #1 SMP Wed May 28 19:28:38 UTC 2008
>x86_64 GNU/Linux
>
>I am using ubuntu hardy and did not use 'sudo' during installation of
>mpich2 (not logged in as superuser) -don't know if this matters.
>
>zach
>  
>

Garrick Staples wrote:

>On Thu, Jul 10, 2008 at 10:15:26PM -0500, zach alleged:
>  
>
>>compiler is gcc on home pc and cluster.
>>
>>MPICH configure: 	--prefix=/opt/mpich/intel --enable-sharedlib
>>--with-romio --enable-f90modules -c++=icpc -cc=icc -fc=ifort
>>-f90=ifort
>>
>>on home pc (the slug),
>>MPICH2 CC: 	gcc  -O2
>>MPICH2 CXX: 	c++  -O2
>>    
>>
>
>Nope.  The cluster is using the Intel compiler.  For some apps, the Intel
>compiler and mpich2 will definitely be much faster than gcc and mpich1.
>-- 
>Garrick Staples, GNU/Linux HPCC SysAdmin
>University of Southern California
>
>
>  
>




More information about the mpich-discuss mailing list