[mpich-discuss] Why is my quad core slower than cluster

Gus Correa gus at ldeo.columbia.edu
Wed Jul 9 10:10:47 CDT 2008


Hi Zach and list

Well, re-reading your message with more attention I saw your
estimate or 1/3 speed on your home computer.
That is indeed on the low side, although I would expect a multi-core machine
to run slower than a cluster of single-cores, specially if the programs 
are memory-intensive.
A lower performance factor on the ballpark of 0.6-0.8 is what I've seen 
so far on
memory-intensive programs (climate models), but not your low value of 
1/3=0.33.

In any case, you are sure all four cores are working on your home PC,
and the SMP kernel is running.

Besides, you seem to have installed the MPICH2 nemesis device in your 
home PC,
and hopefully on the cluster too, as recommended by Rajeev

In addition, you don't seem to have a problem with memory size (which 
would trigger
paging), because you said the total memory in your PC and in the cluster 
is the same.
I would check the amount of memory in /proc/meminfo anyway (look for 
MemTotal). 
Some memory modules are tricky,
and have to be sat on matched slots, in order to be recognized.
We had this problem here with a Dell computer.
The computer manual diagrams were misleading,
and it took a few attempts to get the memory slots right, when we 
upgraded it.
Before that, a system with 8GB of memory installed would recognize only 2GB.

So, if all the considerations above are correct, a number of 
possibilities are removed.
Let's try other possibilities.

What is the other (concurrent) activity in your home PC that runs along
with your mpich program?
I have experienced significant performance degradation of MPI programs
running in standalone PCs when other users login and start their programs.
The memory-greedy Matlab is the first killer, but fancy desktops, 
intense web browsing,
streaming video and music, etc, can compete with the mpich program for 
memory and CPU cycles,
to the point that the mpich program can't really work,
and spends most of the time switching context in/out.
HPC and interactive workstation use don't really mix well.
You can monitor this activity with the "top" command on your PC,
and compare it with what you get from "top" on your cluster nodes.

I would also suggest starting the system at runlevel 3 (no X-windows),
and running the mpich program alone, if you want to make a fair performance
comparison between your home PC and your office cluster (whose nodes are
likely to be at runlevel 3 and be dedicated to run the mpich program only).

Also, a fair comparison should take into account the cpu speeds of each 
computer.
A 3.6GHz processor works faster than a 2.8GHz of similar architecture.
Since both computers you use have Intel processors (comparing Intel with 
AMD seems to be more complicated),
maybe you can just look at the raw processor speeds
in /proc/cpuinfo (look for cpu MHz), and factor in the ratio of these 
values on both computers,
when you compare their performance.

I hope this helps.

Gus Correa

-- 
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
Oceanography Bldg., Rm. 103-D, ph. (845) 365-8911, fax (845) 365-8736
---------------------------------------------------------------------



zach wrote:

>Thanks for the info.
>I tried all of these things but it does not look like it gave any improvement.
>Zach
>
>On Tue, Jul 8, 2008 at 2:52 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
>  
>
>>PS:
>>
>>Zach:  A couple of obvious checks, besides Rajeev's important suggestion:
>>
>>1) Make sure the SMP kernel is running on your home PC:
>>"uname -a"
>>(Should show "smp" as part of the string.)
>>
>>2) Check if Ubuntu triggers all four cores:
>>"cat /proc/cpuinfo". (Should show four "virtual" CPUs.)
>>
>>Gus Correa
>>
>>##########
>>
>>Rajeev Thakur wrote:
>>
>>Try using the Nemesis device in MPICH2 if you aren't already. Configure with
>>--with-device=ch3:nemesis.
>>
>>Rajeev
>>
>>    
>>
>>Gus Correa wrote:
>>
>>    
>>
>>>Hello  Zach and list
>>>
>>>From all that I've observed on dual-processor dual-core PCs,
>>>and from all that I've read on the web about dual-processor quad-core
>>>machines,
>>>your results are not alarming, but typical.
>>>I was as disappointed as you are, when I saw my speedup results.
>>>A lot of people out there had the same frustration too.
>>>
>>>My benchmarks using a standard climate atmospheric model (NCAR CAM3) on
>>>a dual-processor dual-core Xeon workstation showed a speedup factor of 3
>>>(not 4),
>>>when I moved from one core to four cores.
>>>Likewise for a dual-processor dual-core Opteron workstation,
>>>I've got a speedup factor slightly below 3.5. (Better than Xeon, but still
>>>not 4).
>>>
>>>The problem seems to get worse with quad-cores, again with the Opterons
>>>slightly ahead of the game.
>>>Memory/bus contention has been mentioned as the culprit by a lot of
>>>people.
>>>One core in a multicore doesn't scale as one (single-core) CPU.
>>>
>>>You will find plenty of references to this problem on the web and on many
>>>mailing lists:
>>>here in the MPICH list, on the Rocks Cluster list, on the MITgcm list,
>>>etc, etc.
>>>
>>>I hope it heals (as helping it cannot)
>>>Gus Correa
>>>
>>>      
>>>
>>zach wrote:
>>
>>    
>>
>>>I am using a cluster.
>>>Each pc has two cpus and they are Xeon. Each cpu has 4GB, i think red
>>>hat is running.
>>>
>>>I also use a pc at home- quad core intel chip, 8gb ram, ubuntu.
>>>
>>>Both are using mpich.
>>>
>>>I have found that my home pc is only running about 1/3 the speed of
>>>the cluster, and the number of processes (4) and code is the same.
>>>
>>>Can anyone tell me if this is typical, and why, or am I not optimizing
>>>something properly?
>>>
>>>Thanks
>>>Zach
>>>
>>>      
>>>
>>    
>>




More information about the mpich-discuss mailing list