[MPICH] MPICH2 1.05 MPI_Send & MPI_Recv dropping packages randomly

William Gropp gropp at mcs.anl.gov
Wed Jan 17 20:37:19 CST 2007


Can you send us the test case?  Does it fail with the ch3:sock  
device?  Are the messages short or long?

Bill

On Jan 17, 2007, at 7:06 PM, chong tan wrote:

> OS : RedHat Enterprise 4, 2.6.9-42.ELsmp
> CPU   4 dual core Intel
>
> the package was built with :
> setenv CFLAGS "-m32 -O2"
> setenv CC         gcc
> ./configure -prefix=/u/cgtan/my_release_dir --with-device=ch3:ssm -- 
> enable-fast |& tee configure.log
> -----
> the test programs run 5 processes, one master and 4 slaves.  Master  
> always recv from slaves and them send to all of them.  Randomly, an  
> MPI_Send performed in the master will complete, but the  
> corresponidng MPI_Recv in the targeted slave would not complete,  
> and the who thing hangs.
>
> I have a debugging mechanism that attachs a sequence id to all  
> packages sent.  The packages are dumped before and after sent, and  
> recv.  a message is also dumped on the the pending recv.  The  
> sequence id traced OK all the way to the lost package.
>
> The same code work fine with 2.1.04p1.  it has been tested on test  
> cases longer than 100 million send/recv sequences.  any suggestions ?
>
> tan
>
>
> Bored stiff? Loosen up...
> Download and play hundreds of games for free on Yahoo! Games.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070117/4d2e1fad/attachment.htm>


More information about the mpich-discuss mailing list