[mpich-discuss] Assertion failed
buntinas at mcs.anl.gov
Fri Dec 5 10:50:08 CST 2008
On 12/04/2008 08:58 PM, Xavier Olive wrote:
> I have the feeling that the memory gets corrupted after a large number
> messages sent/received. The main program in Java has been extensively
> tested and works properly with sockets. Just after I programmed the
> MPI interface (nothing magic, just a couple of serialization (the data
> are not very big) + Send/Recv), I get those problems. And the memory
> may still be corrupted for the next execution, which makes it fail at
> the first Send met (and makes also fail the small test programs). As I
> make my tests on my desktop computer right now, would it be plausible
> explanation that I run out of memory ?
Hmm, are saying that if you run one mpi program that does a lot of sends
and hits the assert, then run a second simple program that, e.g., just
does a send and receive, that the second program fails too?
If the failure persists between executions of two separate mpi jobs,
that may indicate a problem with the network (card/driver/etc).
Are you using a special network (e.g., InfiniBand, Myrinet)?
More information about the mpich-discuss