[MPICH] An idle communication process use the same CPU as computation process on multi-core chips
Yusong Wang
ywang25 at aps.anl.gov
Fri Sep 14 10:55:07 CDT 2007
Sylvain,
Thanks for your test program!
Here is what I got on my dual-core machine (Fedora):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3666 ywang25 20 0 1580 252 204 R 70 0.0 8:42.02 master
3694 ywang25 20 0 1580 248 200 R 67 0.0 7:20.72 slave
3691 ywang25 20 0 1580 248 200 R 62 0.0 7:31.92 slave
2743 root 20 0 50276 33m 10m S 1 1.7 0:36.90 Xorg
I got nothing with 'strings libmpich.a | grep YIELD' (in the dir of mpich lib)
It seems the yield doesn't happen. Is there any way to force yield?
Thanks,
Yusong
----- Original Message -----
From: Sylvain Jeaugey <sylvain.jeaugey at bull.net>
Date: Friday, September 14, 2007 9:48 am
Subject: Re: [MPICH] An idle communication process use the same CPU as computation process on multi-core chips
> That's unfortunate.
>
> Still, I did two programs. A master :
> ----------------------
> int main() {
> while (1) {
> sched_yield();
> }
> return 0;
> }
> ----------------------
> and a slave :
> ----------------------
> int main() {
> while (1);
> return 0;
> }
> ----------------------
>
> I launch 4 slaves and 1 master on a bi dual-core machine. Here is
> the
> result in top :
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 12361 sylvain 25 0 2376 244 188 R 100 0.0 0:18.26 slave
> 12362 sylvain 25 0 2376 244 188 R 100 0.0 0:18.12 slave
> 12360 sylvain 25 0 2376 244 188 R 100 0.0 0:18.23 slave
> 12363 sylvain 25 0 2376 244 188 R 100 0.0 0:18.15 slave
> 12364 sylvain 20 0 2376 248 192 R 0 0.0 0:00.00 master
> 12365 sylvain 16 0 6280 1120 772 R 0 0.0 0:00.08 top
>
> If you are seeing 66% each, I guess that your master is not
> sched_yield'ing as much as expected. Maybe you should look at
> environment
> variables to force yield when no message is available, and maybe
> your
> master isn't so idle after all and has message to send
> continuously, thus
> not yield'ing.
>
> Anyway, strings <binary|mpi library> | grep YIELD may find
> environment
> variables to control that.
>
> Sylvain
>
> On Fri, 14 Sep 2007, Yusong Wang wrote:
>
> > On Fri, 2007-09-14 at 09:42 +0200, Sylvain Jeaugey wrote:
> >> Yusong,
> >>
> >> I may be wrong for nemesis, but most shm-based MPI
> implementations rely on
> >> busy polling, which make them appear as using 100% CPU. It may
> not be a
> >> problem though because thay also call frequently sched_yield()
> when they
> >> have nothing to receive, which means that if another task is
> running on
> >> the same CPU, the "master" task will give all his CPU time to the
> >> other task.
> > Unfortunately, on my dell Latitude D630 laptop (dual-core), this
> didn't> happen. I launched 3 processes and each process uses %66
> CPU. It seems
> > to me the process switches between the cores, as any two of them
> will be
> > over 100%. On another test with more cores, I launched n_core+1
> > processes, 2 of the processes use 50% CPU, the remaining use 100%
> CPU.>
> >
> >>
> >> So, it's not really a problem to have task 0 at 100% CPU. Just
> launch an
> >> additionnal task and see if it takes the CPU cycles of the
> master. You
> >> might also use taskset (at least on Fedora) to bind tasks on CPUs.
> >>
> >> Sylvain
> >>
> >> On Thu, 13 Sep 2007, Yusong Wang wrote:
> >>
> >>> Hi all,
> >>>
> >>> I have a program which is implemented with a master/slave model
> and the
> >>> master just do very little computation. In my test, the master
> spent>>> most of its time to wait other process to finish MPI_Gather
> >>> communication (confirmed with jumpshot/MPE). In several tests on
> >>> different multi-core chips (dual-core, quad-core, 8-core), I
> found the
> >>> master use the same amount of CPU as the slaves, which should
> do all the
> >>> computation. There are only two exceptions that the master use
> near 0%
> >>> CPU (one on Window, one on Linux), which is what I expect. The
> tests>>> were did on both Fedora Linux and Widows with MPICH2
> (shm/nemesis>>> mpd/smpd). I don't know if it is a software/system
> issue or caused by
> >>> different hardware. I would think this is (at least )related with
> >>> hardware. As with the same operating system, I got different
> CPU usage
> >>> (near 0% or near 100%) for the master on different multi-core
> nodes of
> >>> our clusters.
> >>>
> >>> Is there any documents I can check out for this issue?
> >>>
> >>> Thanks,
> >>>
> >>> Yusong
> >>>
> >>>
> >
> >
>
More information about the mpich-discuss
mailing list