[MPICH] An idle communication process use the same CPU as computation process on multi-core chips

Yusong Wang ywang25 at aps.anl.gov
Fri Sep 14 09:38:24 CDT 2007


On Fri, 2007-09-14 at 09:42 +0200, Sylvain Jeaugey wrote:
> Yusong,
> 
> I may be wrong for nemesis, but most shm-based MPI implementations rely on 
> busy polling, which make them appear as using 100% CPU. It may not be a 
> problem though because thay also call frequently sched_yield() when they 
> have nothing to receive, which means that if another task is running on 
> the same CPU, the "master" task will give all his CPU time to the 
> other task.
Unfortunately, on my dell Latitude D630 laptop (dual-core), this didn't
happen. I launched 3 processes and each process uses %66 CPU. It seems
to me the process switches between the cores, as any two of them will be
over 100%. On another test with more cores, I launched n_core+1
processes, 2 of the processes use 50% CPU, the remaining use 100% CPU.


> 
> So, it's not really a problem to have task 0 at 100% CPU. Just launch an 
> additionnal task and see if it takes the CPU cycles of the master. You 
> might also use taskset (at least on Fedora) to bind tasks on CPUs.
> 
> Sylvain
> 
> On Thu, 13 Sep 2007, Yusong Wang wrote:
> 
> > Hi all,
> >
> > I have a program which is implemented with a master/slave model and the
> > master just do very little computation. In my test, the master spent
> > most of its time to wait other process to finish MPI_Gather
> > communication (confirmed with jumpshot/MPE). In several tests on
> > different multi-core chips (dual-core, quad-core, 8-core), I found the
> > master use the same amount of CPU as the slaves, which should do all the
> > computation. There are only two exceptions that the master use near 0%
> > CPU (one on Window, one on Linux), which is what I expect. The tests
> > were did on both Fedora Linux and Widows with MPICH2 (shm/nemesis
> > mpd/smpd). I don't know if it is a software/system issue or caused by
> > different hardware. I would think this is  (at least )related with
> > hardware. As with the same operating system, I got different CPU usage
> > (near 0% or near 100%) for the master on different multi-core nodes of
> > our clusters.
> >
> > Is there any documents I can check out for this issue?
> >
> > Thanks,
> >
> > Yusong
> >
> >




More information about the mpich-discuss mailing list