[mpich-discuss] resources are equally shared when running processes in one machine ?

Dave Goodell goodell at mcs.anl.gov
Fri Jul 17 08:29:25 CDT 2009


On Jul 16, 2009, at 4:11 PM, Gra zeus wrote:

> when i run my mpi code with "mpiexec -n 1 ./myprogram", everything  
> work fine.
>
> however, when i rn with "mpiexec -n 4 ./myprogram", performance of  
> my program dropped significantly.
>
> I code my program to work only for "process 0" like this:-
>      if(id=0){  do_computation_task(); } else {  //do_nothing };
>
> Is this mean physical resources are shared when I spawn more than  
> one process in one physical machine(even all of work were done by  
> "process 0" and other processes do nothing)?

What kind of system are you running this program on?  How many  
processors/cores does the machine have?  Do either branches of your  
code (do_computation_task or do_nothing) call into the MPI library?

How is your do_nothing implemented?  Do you just fall through to the  
code after this if/else (which I assume is communication code), do you  
call sleep, or do you busy wait on something?

If you are oversubscribing the machine (more MPI processes than cores)  
and they perform very much MPI communication then it is expected that  
you would see a performance drop with a default build of the  
MPICH2-1.1 series.  The default channel is nemesis and it busy polls  
by default, which takes up CPU resources whenever a process is inside  
the MPI library.  For this reason it is best not to oversubscribe when  
using nemesis.

If it is a problem because of the oversubscription then you have a  
couple of options:

1) You can re-configure your MPICH2 build with "--with- 
device=ch3:sock".  This channel is slower for intra-node communication  
but it doesn't busy poll so it is more suitable for oversubscription.
2) Get a machine with more cores or spread the work over multiple  
machines such that you are no longer over-subscribing.

> If the answer is "yes", can you pls tell me if it's stated somewhere  
> in official document that I can make reference to.

Hmm... I thought we had something about this somewhere, but I can't  
seem to find it.  I'll add something to the FAQ later today.

-Dave



More information about the mpich-discuss mailing list