[MPICH] MPICH2 1.05 MPI_Send & MPI_Recv dropping packages randomly

chong tan chong_guan_tan at yahoo.com
Thu Jan 18 13:05:07 CST 2007


the test works with sock.  So, it is likely to be some bug in ssm.

tan



----- Original Message ----
From: William Gropp <gropp at mcs.anl.gov>
To: chong tan <chong_guan_tan at yahoo.com>
Cc: mpich-discuss at mcs.anl.gov
Sent: Wednesday, January 17, 2007 6:37:19 PM
Subject: Re: [MPICH] MPICH2 1.05 MPI_Send & MPI_Recv dropping packages randomly

Can you send us the test case?  Does it fail with the ch3:sock device?  Are the messages short or long?  


Bill


On Jan 17, 2007, at 7:06 PM, chong tan wrote:


OS : RedHat Enterprise 4, 2.6.9-42.ELsmp
CPU   4 dual core Intel
 
the package was built with :
setenv CFLAGS "-m32 -O2"
setenv CC         gcc
./configure -prefix=/u/cgtan/my_release_dir --with-device=ch3:ssm --enable-fast |& tee configure.log

-----
the test programs run 5 processes, one master and 4 slaves.  Master always recv from slaves and them send to all of them.  Randomly, an MPI_Send performed in the master will complete, but the corresponidng MPI_Recv in the targeted slave would not complete, and the who thing hangs. 
 
I have a debugging mechanism that attachs a sequence id to all packages sent.  The packages are dumped before and after sent, and recv.  a message is also dumped on the the pending recv.  The sequence id traced OK all the way to the lost package.
 
The same code work fine with 2.1.04p1.  it has been tested on test cases longer than 100 million send/recv sequences.  any suggestions ?
 
tan
 



Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.


 
____________________________________________________________________________________
We won't tell. Get more on shows you hate to love 
(and love to hate): Yahoo! TV's Guilty Pleasures list.
http://tv.yahoo.com/collections/265 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070118/52c04c4e/attachment.htm>


More information about the mpich-discuss mailing list