[MPICH] Re: MPICH crashes with large number of messages

Anh Vo vtqanh at gmail.com
Fri Oct 5 22:13:52 CDT 2007


Indeed. I ran top and observed that the root actually was consuming a
huge amount of memory. We found out that because all the processes
were using MPI_Send, which then got queued up and overflew the buffer
of the MPI_Recv at root. Switching to MPI_Ssend works perfectly.

Thanks for all your help
--Anh

On 10/5/07, Sylvain Jeaugey <sylvain.jeaugey at bull.net> wrote:
> On Thu, 4 Oct 2007, Anh Vo wrote:
>
> > It looked like the root node was terminated by the system (killed by
> > signal 9). What could cause the system to terminate an MPI process?
> Out of memory Kill. Look at dmesg on your first node and see if the
> kernel is not killing your process 0 because of excessive memory
> consumption.
>
> Sylvain
>
>




More information about the mpich-discuss mailing list