[MPICH] MPICH2 1.05 MPI_Send & MPI_Recv dropping packages randomly
William Gropp
gropp at mcs.anl.gov
Wed Jan 17 20:37:19 CST 2007
Can you send us the test case? Does it fail with the ch3:sock
device? Are the messages short or long?
Bill
On Jan 17, 2007, at 7:06 PM, chong tan wrote:
> OS : RedHat Enterprise 4, 2.6.9-42.ELsmp
> CPU 4 dual core Intel
>
> the package was built with :
> setenv CFLAGS "-m32 -O2"
> setenv CC gcc
> ./configure -prefix=/u/cgtan/my_release_dir --with-device=ch3:ssm --
> enable-fast |& tee configure.log
> -----
> the test programs run 5 processes, one master and 4 slaves. Master
> always recv from slaves and them send to all of them. Randomly, an
> MPI_Send performed in the master will complete, but the
> corresponidng MPI_Recv in the targeted slave would not complete,
> and the who thing hangs.
>
> I have a debugging mechanism that attachs a sequence id to all
> packages sent. The packages are dumped before and after sent, and
> recv. a message is also dumped on the the pending recv. The
> sequence id traced OK all the way to the lost package.
>
> The same code work fine with 2.1.04p1. it has been tested on test
> cases longer than 100 million send/recv sequences. any suggestions ?
>
> tan
>
>
> Bored stiff? Loosen up...
> Download and play hundreds of games for free on Yahoo! Games.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070117/4d2e1fad/attachment.htm>
More information about the mpich-discuss
mailing list