[mpich-discuss] Why is my quad core slower than cluster

chong tan chong_guan_tan at yahoo.com
Tue Jul 15 12:06:12 CDT 2008


I have plenty of those.  Some are under NDA, not to be shared.  Some are property of the company, can't be shared either.  But, all of those information can be obtained, either by chewing down a good HW reference on the CPU, or experiment with the system.
 
The first thing you must do is know your system, mainly in memory architecture of your system.  Know how those memory are addressed by both your HW and OS.  Also, Many systems do not have the extra bandwidth to run MP SW.  Simply put, if your system is cheap, say < $2000, it is likely to fall into this category.  
 
 
We should also understand that there exist no perfect MP solution that give us good performance gain (X) across all platform.   You can't do much here, but to investigate how the OS or other utilities can help.  BTW, the only SW that does well consistently across the board on all platforms, that I know of, is the search and prune based SW.
 
You should be looking at affining the process/CPU/mem when runnning your MP job.  try 'taskset' or 'numactl' to have each job 'fixed' on a CPU/core.   This is the last resource you have, and I have found it very helpful in my life in this MP world.  For example, just having a job switched from a core on the CPU to another core can be very expensive on some of these CPUs.
 
You are in the university, the place where challenges are tackled relentlessly (or supposed to be).  I will leave it here for you to explore further.
 
tan 
 
 


--- On Tue, 7/15/08, Gaetano Bellanca <gaetano.bellanca at unife.it> wrote:

From: Gaetano Bellanca <gaetano.bellanca at unife.it>
Subject: Re: [mpich-discuss] Why is my quad core slower than cluster
To: mpich-discuss at mcs.anl.gov
Date: Tuesday, July 15, 2008, 4:24 AM


Hi,

maybe I reached my goal; I wanted to be a little polemic to see if, someone who knows more things about that, can give us a clear answer. I tested a multicore machine with a commercial software, and I noted that some speed-up can be obtained (not as good as expected, but much more than I have)... :)

The problem is that I didn't  find any clear answer  on the references I was able to find.
I looked on the network to have some suggestions, but I confess that I didn't find anything really effective. 
I tried the options indicated in the OpenMP manual; I'm not a computer scientist, but a simple user, and using them I was unable to have any increase in the speed-up.

Things where totally different with mpi: in few days, attending a good school, and also using a very good materials available on the network,, it has been very easy to see the results of my effort. Maybe this problems is more difficult; maybe I'm unable to find the good references. Maybe I don't have the capability. In any case, it seems I'm not the only one with these problems. 

Please, if someone have some clear references, share them with the community (I'm asking references, not the solutions! We are not expecting something falling from the sky ... in case, we will buy a commercial software :)  ).  

Gaetano


At 23.06 14/07/2008, you wrote:



In defend of the designers of the multi-core CPUs, I like to say you are wrong in saying so.  You re given a Jet plane, but prefer to use it as a boat, no wonder it sunk.

 

tan


--- On Mon, 7/14/08, Gaetano Bellanca <gaetano.bellanca at unife.it> wrote:


From: Gaetano Bellanca <gaetano.bellanca at unife.it>

Subject: Re: [mpich-discuss] Why is my quad core slower than cluster

To: mpich-discuss at mcs.anl.gov

Date: Monday, July 14, 2008, 1:39 PM


Hello Gus and list,


I compiled mpich using gcc, no icc. Maybe this could be a good option to try to obtain better performance.


Anyway, maybe I'm wrong as I'm only a (very very basic) user, but I don't think it is a mpich problem. 

Using mpich with n=2, the code speeds up as requested (and I think that 2 different processors are used, looking at htop report). This does not happen with n=4, where I still observed a (really small) speed-up. But with n=6 or n=8, no speed-up is observed!


Again, I tried to compile with parallel and openmp (and to write a very simple code with do loop similar to my original simulator, but without mpi calls, communications, etc ...). I didn't have the expected speed-up also in this case. I'm still investigating .... but ... I really think that multi core are a commercial gadget .... unless you are a very specialist in writing a dedicated code able to take advantage of the specific architecture of the processor. 


Hope to be wrong ...


Gaetano




Gaetano Bellanca - Department of Engineering - University of Ferrara  

Via Saragat, 1 - 44100 - Ferrara - ITALY             

Voice (VoIP):  +39 0532 974809     Fax:  +39 0532 974870

mailto:gaetano.bellanca at unife.it 





Gaetano Bellanca - Department of Engineering - University of Ferrara  
Via Saragat, 1 - 44100 - Ferrara - ITALY             
Voice (VoIP):  +39 0532 974809     Fax:  +39 0532 974870
mailto:gaetano.bellanca at unife.it 





      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080715/7fe5a76d/attachment.htm>


More information about the mpich-discuss mailing list