[MPICH] An idle communication process use the same CPU as computation process on multi-core chips

Yusong Wang ywang25 at aps.anl.gov
Fri Sep 14 10:19:50 CDT 2007


Darius,

I downloaded the most recent source code for MPICH2 and configured with 
configure --with-device=ch3:shm --enable-fast 

By the way, how to check if the shed_yield() is disabled or not? The
command "strings <binary|mpi library> | grep YIELD" Sylvain sent to me
doesn't work on my system.

Thanks for your help!

Yusong 
On Fri, 2007-09-14 at 10:05 -0500, Darius Buntinas wrote:
> Yusong,
> 
> Can you give me more information about your application?  Is this an 
> MPICH2 application?  What channel are you using?  What configure flags 
> did you use?  I believe that sched_yield() can be disabled depending on 
> certain configure flags.
> 
> Thanks,
> -d
> 
> On 09/14/2007 09:48 AM, Sylvain Jeaugey wrote:
> > That's unfortunate.
> > 
> > Still, I did two programs. A master :
> > ----------------------
> > int main() {
> >         while (1) {
> >             sched_yield();
> >         }
> >         return 0;
> > }
> > ----------------------
> > and a slave :
> > ----------------------
> > int main() {
> >         while (1);
> >         return 0;
> > }
> > ----------------------
> > 
> > I launch 4 slaves and 1 master on a bi dual-core machine. Here is the 
> > result in top :
> > 
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > 12361 sylvain   25   0  2376  244  188 R  100  0.0   0:18.26 slave
> > 12362 sylvain   25   0  2376  244  188 R  100  0.0   0:18.12 slave
> > 12360 sylvain   25   0  2376  244  188 R  100  0.0   0:18.23 slave
> > 12363 sylvain   25   0  2376  244  188 R  100  0.0   0:18.15 slave
> > 12364 sylvain   20   0  2376  248  192 R    0  0.0   0:00.00 master
> > 12365 sylvain   16   0  6280 1120  772 R    0  0.0   0:00.08 top
> > 
> > If you are seeing 66% each, I guess that your master is not 
> > sched_yield'ing as much as expected. Maybe you should look at 
> > environment variables to force yield when no message is available, and 
> > maybe your master isn't so idle after all and has message to send 
> > continuously, thus not yield'ing.
> > 
> > Anyway, strings <binary|mpi library> | grep YIELD may find environment 
> > variables to control that.
> > 
> > Sylvain
> > 
> > On Fri, 14 Sep 2007, Yusong Wang wrote:
> > 
> >> On Fri, 2007-09-14 at 09:42 +0200, Sylvain Jeaugey wrote:
> >>> Yusong,
> >>>
> >>> I may be wrong for nemesis, but most shm-based MPI implementations 
> >>> rely on
> >>> busy polling, which make them appear as using 100% CPU. It may not be a
> >>> problem though because thay also call frequently sched_yield() when they
> >>> have nothing to receive, which means that if another task is running on
> >>> the same CPU, the "master" task will give all his CPU time to the
> >>> other task.
> >> Unfortunately, on my dell Latitude D630 laptop (dual-core), this didn't
> >> happen. I launched 3 processes and each process uses %66 CPU. It seems
> >> to me the process switches between the cores, as any two of them will be
> >> over 100%. On another test with more cores, I launched n_core+1
> >> processes, 2 of the processes use 50% CPU, the remaining use 100% CPU.
> >>
> >>
> >>>
> >>> So, it's not really a problem to have task 0 at 100% CPU. Just launch an
> >>> additionnal task and see if it takes the CPU cycles of the master. You
> >>> might also use taskset (at least on Fedora) to bind tasks on CPUs.
> >>>
> >>> Sylvain
> >>>
> >>> On Thu, 13 Sep 2007, Yusong Wang wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I have a program which is implemented with a master/slave model and the
> >>>> master just do very little computation. In my test, the master spent
> >>>> most of its time to wait other process to finish MPI_Gather
> >>>> communication (confirmed with jumpshot/MPE). In several tests on
> >>>> different multi-core chips (dual-core, quad-core, 8-core), I found the
> >>>> master use the same amount of CPU as the slaves, which should do all 
> >>>> the
> >>>> computation. There are only two exceptions that the master use near 0%
> >>>> CPU (one on Window, one on Linux), which is what I expect. The tests
> >>>> were did on both Fedora Linux and Widows with MPICH2 (shm/nemesis
> >>>> mpd/smpd). I don't know if it is a software/system issue or caused by
> >>>> different hardware. I would think this is  (at least )related with
> >>>> hardware. As with the same operating system, I got different CPU usage
> >>>> (near 0% or near 100%) for the master on different multi-core nodes of
> >>>> our clusters.
> >>>>
> >>>> Is there any documents I can check out for this issue?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Yusong
> >>>>
> >>>>
> >>
> >>
> > 




More information about the mpich-discuss mailing list