<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:times new roman, new york, times, serif;font-size:12pt"><DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">No, I have twice the Proceessors than processes. all the CPU are multi-core.</DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">Only that 1 CPU has 1 proc, the master, more than others. </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">The interesting issue is that if I merge the master and the slave on that CPU into 1 </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">process, the issue of extra 60+ minute before successful first MPI disappears. During this extra</DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">60 minutes, the only things that is active other than the slave is the nemesis polling. Which</DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">lead me to conclude that polling is one of the main contributor to the issue.</DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"> </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">tan</DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"> </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"> </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">----- Original Message ----<BR>From: Darius Buntinas <buntinas@mcs.anl.gov><BR>To: chong tan <chong_guan_tan@yahoo.com><BR>Cc: mpich-discuss@mcs.anl.gov<BR>Sent: Friday, December 21, 2007 1:27:24 PM<BR>Subject: Re: [MPICH] any way to ask nemesis to turn-off and turn of active polling ?<BR><BR><BR>If you have P processors, and you're running P slaves and 1 master, I'm <BR>not sure how you could have P+1 processes running at 100%. Are you <BR>running one slave per processor, then adding an additional master?<BR><BR>If you have a process waiting in a blocking receive, it will show that <BR>it's using 100% of the CPU if it has its own CPU to run on, but if that <BR>process has to share a CPU with another process that's doing some work, <BR>only then will you see the CPU usage of the waiting process go down.<BR><BR>-d<BR><BR>On 12/17/2007 04:48 PM, chong tan
wrote:<BR>> I am running RedHat enterprise 5. sysctl complains that <BR>> sched_compat_yield is not known for kernel.<BR>> <BR>> BTW, I run the test, and both master and slaves utilize 100% of CPU. <BR>> <BR>> any suggestion ?<BR>> <BR>> thanks<BR>> tan<BR>> <BR>> <BR>> <BR>> ----- Original Message ----<BR>> From: Darius Buntinas <<A href="mailto:buntinas@mcs.anl.gov" ymailto="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</A>><BR>> To: chong tan <<A href="mailto:chong_guan_tan@yahoo.com" ymailto="mailto:chong_guan_tan@yahoo.com">chong_guan_tan@yahoo.com</A>><BR>> Cc: <A href="mailto:mpich-discuss@mcs.anl.gov" ymailto="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A><BR>> Sent: Monday, December 17, 2007 1:23:46 PM<BR>> Subject: Re: [MPICH] any way to ask nemesis to turn-off and turn of <BR>> active polling ?<BR>> <BR>> <BR>>
<BR>> On 12/17/2007 01:00 PM, chong tan wrote:<BR>> > Thanks,<BR>> > I don;t have root access to the box. I will see if I can ask sys-admin<BR>> > to do it. I am running<BR>> > Linux snowwhite 2.6.18-8.el5 #1 SMP<BR>> > <BR>> > DO you know if the broken yield got into this version ?<BR>> <BR>> I don't know, but you can try the master/slave programs from the<BR>> discussion we had on sched_yield a few months ago:<BR>> <BR>> Master:<BR>> <BR>> int main() {<BR>> while (1) {<BR>> sched_yield();<BR>> }<BR>> return 0;<BR>> }<BR>> <BR>> Slave:<BR>> <BR>> int main() {<BR>> while (1);<BR>> return 0;<BR>> }<BR>> <BR>> Start 4 slaves first, THEN one master, and check 'top'. If it shows<BR>> that the master is taking more than 1% or so, you have a kernel with
the<BR>> 'broken' yield.<BR>> <BR>> > FYI : the 'yield' people said it is not 'broken', it is in fact the<BR>> > 'right yield'.<BR>> <BR>> Maybe, but Linus is on my side :-)<BR>> <BR>> -d<BR>> <BR>> > <BR>> > tan<BR>> ><BR>> ><BR>> > <BR>> > ----- Original Message ----<BR>> > From: Darius Buntinas <<A href="mailto:buntinas@mcs.anl.gov" ymailto="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</A> <BR>> <mailto:<A href="mailto:buntinas@mcs.anl.gov" ymailto="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</A>>><BR>> > To: chong tan <<A href="mailto:chong_guan_tan@yahoo.com" ymailto="mailto:chong_guan_tan@yahoo.com">chong_guan_tan@yahoo.com</A> <BR>> <mailto:<A href="mailto:chong_guan_tan@yahoo.com" ymailto="mailto:chong_guan_tan@yahoo.com">chong_guan_tan@yahoo.com</A>>><BR>>
> Cc: <A href="mailto:mpich-discuss@mcs.anl.gov" ymailto="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A> <mailto:<A href="mailto:mpich-discuss@mcs.anl.gov" ymailto="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A>><BR>> > Sent: Monday, December 17, 2007 10:50:12 AM<BR>> > Subject: Re: [MPICH] any way to ask nemesis to turn-off and turn of<BR>> > active polling ?<BR>> ><BR>> ><BR>> > Try setting the processor affinity for the "average" processes (map each<BR>> > one to its own processor). If you have a kernel with the "broken"<BR>> > sched_yield implementations, that may not help.<BR>> ><BR>> > If you have a "broken" sched_yield implementation, you can try doing<BR>> > this as root:<BR>> > sysctl kernel.sched_compat_yield=1<BR>> > or<BR>>
> echo "1">/proc/sys/kernel/sched_compat_yield<BR>> ><BR>> > -d<BR>> ><BR>> ><BR>> > On 12/17/2007 11:35 AM, chong tan wrote:<BR>> > > Yes, in a very subtle way which has major impact on performance. <BR>> I will<BR>> > > try to decribe it a litle here:<BR>> > ><BR>> > > system has 32G, total image 35G. Load is a litle offbalance<BR>> > > mathematically, 4X dual core, running 5 processes.<BR>> > > 4 processes are the same size, each runs on a CPU. the last <BR>> process is<BR>> > > very small, about10% of others, run<BR>> > > on a core of one of the CPU. SO 1 CPU runs 2 procs: average <BR>> (P1)one and<BR>> > > light one (P2).<BR>> >
><BR>> > > All proc do first MPI comm in a fixed algorithmic point. The 'useful'<BR>> > > image is about 29G at that point, and should<BR>> > > fit into the physical memory. P2 get there in a heart beat, then<BR>> > > others., followed by P1 which took another 60+ minutes<BR>> > > to get there. If I combine P1 and P2 into 1 process, then I don;t no<BR>> > > see this extra delay.<BR>> > ><BR>> > > tan<BR>> > ><BR>> > ><BR>> > ><BR>> > > ----- Original Message ----<BR>> > > From: Darius Buntinas <<A href="mailto:buntinas@mcs.anl.gov" ymailto="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</A> <BR>> <mailto:<A href="mailto:buntinas@mcs.anl.gov"
ymailto="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</A>><BR>> > <mailto:<A href="mailto:buntinas@mcs.anl.gov" ymailto="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</A> <mailto:<A href="mailto:buntinas@mcs.anl.gov" ymailto="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</A>>>><BR>> > > To: chong tan <<A href="mailto:chong_guan_tan@yahoo.com" ymailto="mailto:chong_guan_tan@yahoo.com">chong_guan_tan@yahoo.com</A> <BR>> <mailto:<A href="mailto:chong_guan_tan@yahoo.com" ymailto="mailto:chong_guan_tan@yahoo.com">chong_guan_tan@yahoo.com</A>><BR>> > <mailto:<A href="mailto:chong_guan_tan@yahoo.com" ymailto="mailto:chong_guan_tan@yahoo.com">chong_guan_tan@yahoo.com</A> <mailto:<A href="mailto:chong_guan_tan@yahoo.com" ymailto="mailto:chong_guan_tan@yahoo.com">chong_guan_tan@yahoo.com</A>>>><BR>> > > Cc: <A
href="mailto:mpich-discuss@mcs.anl.gov" ymailto="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A> <mailto:<A href="mailto:mpich-discuss@mcs.anl.gov" ymailto="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A>> <BR>> <mailto:<A href="mailto:mpich-discuss@mcs.anl.gov" ymailto="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A> <mailto:<A href="mailto:mpich-discuss@mcs.anl.gov" ymailto="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A>>><BR>> > > Sent: Monday, December 17, 2007 8:02:17 AM<BR>> > > Subject: Re: [MPICH] any way to ask nemesis to turn-off and turn of<BR>> > > active polling ?<BR>> > ><BR>> > ><BR>> > > No, there's no way to do that. Even MPI_Barrier will do active <BR>> polling.<BR>> > ><BR>> >
> Are you having issues where an MPI process that is waiting in a <BR>> blocking<BR>> > > call is taking CPU time away from other processes?<BR>> > ><BR>> > > -d<BR>> > ><BR>> > > On 12/14/2007 04:53 PM, chong tan wrote:<BR>> > > > My issue is like this :<BR>> > > ><BR>> > > > among all the processess, some will get to the point of first MPI<BR>> > > > communication points faster than<BR>> > > > than other. Is there a way that I tell nemesis to start <BR>> without doing<BR>> > > > active polling, and then turn<BR>> > > > on active polling with some function ?<BR>> > > ><BR>> >
> > Or should I just use MPI_Barrier() on that ?<BR>> > > ><BR>> > > > thanks<BR>> > > > tan<BR>> > > ><BR>> > > ><BR>> > > ><BR>> > ------------------------------------------------------------------------<BR>> > > > Be a better friend, newshound, and know-it-all with Yahoo! <BR>> Mobile. Try<BR>> > > > it now.<BR>> > > ><BR>> > ><BR>> > <BR>> <<A href="http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ" target=_blank>http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ</A> <BR>> <BR>> ><BR>> >
><BR>> > > > ><BR>> > ><BR>> > ><BR>> > > <BR>> ------------------------------------------------------------------------<BR>> > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try<BR>> > > it now.<BR>> > ><BR>> > <BR>> <<A href="http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ" target=_blank>http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ</A> <BR>> <BR>> ><BR>> > > ><BR>> ><BR>> ><BR>> > ------------------------------------------------------------------------<BR>> > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try<BR>> > it
now.<BR>> > <BR>> <<A href="http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ" target=_blank>http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ</A> <BR>> <BR>> > ><BR>> <BR>> <BR>> ------------------------------------------------------------------------<BR>> Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try <BR>> it now. <BR>> <<A href="http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ" target=_blank>http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ</A> <BR>> ><BR></DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><BR></DIV></div><br>
<hr size=1>Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. <a href="http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ "> Try it now.</a></body></html>