<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:times new roman, new york, times, serif;font-size:12pt"><DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">I will give that a try. </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">tan</DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><BR><BR> </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">----- Original Message ----<BR>From: Rajeev Thakur <thakur@mcs.anl.gov><BR>To: chong tan <chong_guan_tan@yahoo.com><BR>Cc: mpich-discuss@mcs.anl.gov<BR>Sent: Thursday, January 18, 2007 9:43:37 AM<BR>Subject: RE: [MPICH] MPICH2 1.05 MPI_Send & MPI_Recv dropping packages randomly<BR><BR>
<STYLE type=text/css>DIV {
MARGIN:0px;}
</STYLE>
<DIV dir=ltr align=left><SPAN class=116214217-18012007><FONT face=Arial color=#0000ff size=2>Can you try using the Nemesis channel? Configure with --with-device=ch3:nemesis. That will use shared memory within a node and TCP across nodes and should actually perform better than ssm.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=116214217-18012007><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=116214217-18012007><FONT face=Arial color=#0000ff size=2>Rajeev</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=116214217-18012007><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV><BR>
<BLOCKQUOTE style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> owner-mpich-discuss@mcs.anl.gov [mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>chong tan<BR><B>Sent:</B> Thursday, January 18, 2007 11:11 AM<BR><B>To:</B> William Gropp<BR><B>Cc:</B> mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> Re: [MPICH] MPICH2 1.05 MPI_Send & MPI_Recv dropping packages randomly<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">all the messages are short messages, the shortest being 3 integer (32 bits), the longest 9 integers.</DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"> </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">I can't send you the code per company policy. There are about 3 million lines of C, C++ and Tcl. MPI is used in an isolated part of the code.</DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"> </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">I will try sock, sock runs almost 11X slower on this prtiticular machine. On 2.1.04p1, overhead by ssm was 50 sec, and sock's overhead was 520 sec on the failed test.</DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"> </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">tan</DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><BR><BR> </DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">----- Original Message ----<BR>From: William Gropp <gropp@mcs.anl.gov><BR>To: chong tan <chong_guan_tan@yahoo.com><BR>Cc: mpich-discuss@mcs.anl.gov<BR>Sent: Wednesday, January 17, 2007 6:37:19 PM<BR>Subject: Re: [MPICH] MPICH2 1.05 MPI_Send & MPI_Recv dropping packages randomly<BR><BR>Can you send us the test case? Does it fail with the ch3:sock device? Are the messages short or long?
<DIV><BR class=khtml-block-placeholder></DIV>
<DIV>Bill</DIV>
<DIV><BR>
<DIV>
<DIV>On Jan 17, 2007, at 7:06 PM, chong tan wrote:</DIV><BR class=Apple-interchange-newline>
<BLOCKQUOTE type="cite"><SPAN class=Apple-style-span style="WORD-SPACING: 0px; FONT: 12px Helvetica; TEXT-TRANSFORM: none; COLOR: rgb(0,0,0); TEXT-INDENT: 0px; WHITE-SPACE: normal; LETTER-SPACING: normal; BORDER-COLLAPSE: separate; border-spacing: 0px 0px; orphans: 2; widows: 2">
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">OS : RedHat Enterprise 4, 2.6.9-42.ELsmp</SPAN></DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">CPU 4 dual core Intel</SPAN></DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"></SPAN> </DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">the package was built with :</SPAN></DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">setenv CFLAGS "-m32 -O2"</SPAN><BR style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">setenv CC gcc</SPAN><BR style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">./configure -prefix=/u/cgtan/my_release_dir --with-device=ch3:ssm --enable-fast |& tee configure.log</SPAN><BR style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"></DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">-----</SPAN></DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">the test programs run 5 processes, one master and 4 slaves. Master always recv from slaves and them send to all of them. Randomly, an MPI_Send performed in the master will complete, but the corresponidng MPI_Recv in the targeted slave would not complete, and the who thing hangs. </SPAN></DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"></SPAN> </DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">I have a debugging mechanism that attachs a sequence id to all packages sent. The packages are dumped before and after sent, and recv. a message is also dumped on the the pending recv. The sequence id traced OK all the way to the lost package.</SPAN></DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"></SPAN> </DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">The same code work fine with 2.1.04p1. it has been tested on test cases longer than 100 million send/recv sequences. any suggestions ?</SPAN></DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"></SPAN> </DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman">tan</SPAN></DIV>
<DIV style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"><SPAN class=Apple-style-span style="FONT-SIZE: 16px; FONT-FAMILY: times new roman"></SPAN> </DIV></DIV><BR>
<HR SIZE=1>
<A href="http://us.rd.yahoo.com/evt=49935/*http://games.yahoo.com" target=_blank rel=nofollow><SPAN class=Apple-style-span style="COLOR: rgb(0,0,238)">Bored stiff?</SPAN></A><SPAN class=Apple-converted-space> </SPAN>Loosen up...<BR><A href="http://us.rd.yahoo.com/evt=49935/*http://games.yahoo.com" target=_blank rel=nofollow><SPAN class=Apple-style-span style="COLOR: rgb(0,0,238)">Download and play hundreds of games for free</SPAN></A><SPAN class=Apple-converted-space> </SPAN>on Yahoo! Games.</SPAN></BLOCKQUOTE></DIV><BR></DIV></DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><BR></DIV></DIV><BR>
<HR SIZE=1>
Now that's room service! <A href="http://travel.yahoo.com/hotelsearchpage;_ylc=X3oDMTFtaTIzNXVjBF9TAzk3NDA3NTg5BF9zAzI3MTk0ODEEcG9zAzIEc2VjA21haWx0YWdsaW5lBHNsawNxMS0wNw--" target=_blank rel=nofollow>Choose from over 150,000 hotels <BR>in 45,000 destinations on Yahoo! Travel</A> to find your fit.</BLOCKQUOTE></DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><BR></DIV></div><br>
<hr size=1>8:00? 8:25? 8:40? <a href="
http://tools.search.yahoo.com/shortcuts/?fr=oni_on_mail&#news"> Find a flick</a> in no time<br> with the<a href="
http://tools.search.yahoo.com/shortcuts/?fr=oni_on_mail&#news">Yahoo! Search movie showtime shortcut.</a></body></html>