Hi,<br><br> I have a problem when running codes with mpi. I compile and run the code with mpich 2-1.0.5p3 and after sometime the code gives this error - <br><br>[cli_2]: aborting job:<br>Fatal error in MPI_Recv: Other MPI error, error stack:<br>
MPI_Recv(186).............................: MPI_Recv(buf=0x31b58a0, count=584, MPI_DOUBLE_PRECISION, src=3, tag=1, MPI_COMM_WORLD, status=0x1122be0) failed<br>MPIDI_CH3I_Progress(144)..................: handle_sock_op failed<br>
MPIDI_CH3I_Progress_handle_sock_event(175):<br>MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=2,errno=104:Connection reset by peer)<br>[cli_0]: aborting job:<br>Fatal error in MPI_Recv: Other MPI error, error stack:<br>
MPI_Recv(186).............................: MPI_Recv(buf=0x31b7d70, count=584, MPI_DOUBLE_PRECISION, src=1, tag=1, MPI_COMM_WORLD, status=0x1122be0) failed<br>MPIDI_CH3I_Progress(144)..................: handle_sock_op failed<br>
MPIDI_CH3I_Progress_handle_sock_event(175):<br>MPIDU_Socki_handle_read(607)..............: connection closed by peer (set=0,sock=4)<br>[cli_1]: [cli_4]: aborting job:<br>Fatal error in MPI_Recv: Other MPI error, error stack:<br>
MPI_Recv(186).............................: MPI_Recv(buf=0x31b6b08, count=584, MPI_DOUBLE_PRECISION, src=2, tag=1, MPI_COMM_WORLD, status=0x1122be0) failed<br>MPIDI_CH3I_Progress(144)..................: handle_sock_op failed<br>
MPIDI_CH3I_Progress_handle_sock_event(175):<br>MPIDU_Socki_handle_read(607)..............: connection closed by peer (set=0,sock=2)<br>aborting job:<br>Fatal error in MPI_Recv: Other MPI error, error stack:<br>MPI_Recv(186).............................: MPI_Recv(buf=0x3178918, count=584, MPI_DOUBLE_PRECISION, src=3, tag=0, MPI_COMM_WORLD, status=0x1122be0) failed<br>
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed<br>MPIDI_CH3I_Progress_handle_sock_event(175):<br>MPIDU_Socki_handle_read(607)..............: connection closed by peer (set=0,sock=4)<br>rank 4 in job 1 master_32935 caused collective abort of all ranks<br>
exit status of rank 4: killed by signal 9<br>[cli_3]: aborting job:<br>Fatal error in MPI_Recv: Other MPI error, error stack:<br>MPI_Recv(186).............................: MPI_Recv(buf=0x3178918, count=584, MPI_DOUBLE_PRECISION, src=2, tag=0, MPI_COMM_WORLD, status=0x1122be0) failed<br>
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed<br>MPIDI_CH3I_Progress_handle_sock_event(175):<br>MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=3,errno=104:Connection reset by peer)<br>
rank 3 in job 1 master_32935 caused collective abort of all ranks<br> exit status of rank 3: killed by signal 9<br>[cli_2]: aborting job:<br>Fatal error in MPI_Recv: Other MPI error, error stack:<br>MPI_Recv(186).............................: MPI_Recv(buf=0x31b58a0, count=584, MPI_DOUBLE_PRECISION, src=3, tag=1, MPI_COMM_WORLD, status=0x1122be0) failed<br>
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed<br>MPIDI_CH3I_Progress_handle_sock_event(175):<br>MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=2,errno=104:Connection reset by peer)<br>
[cli_0]: aborting job:<br>Fatal error in MPI_Recv: Other MPI error, error stack:<br>MPI_Recv(186).............................: MPI_Recv(buf=0x31b7d70, count=584, MPI_DOUBLE_PRECISION, src=1, tag=1, MPI_COMM_WORLD, status=0x1122be0) failed<br>
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed<br>MPIDI_CH3I_Progress_handle_sock_event(175):<br>MPIDU_Socki_handle_read(607)..............: connection closed by peer (set=0,sock=4)<br>[cli_1]: [cli_4]: aborting job:<br>
Fatal error in MPI_Recv: Other MPI error, error stack:<br>MPI_Recv(186).............................: MPI_Recv(buf=0x31b6b08, count=584, MPI_DOUBLE_PRECISION, src=2, tag=1, MPI_COMM_WORLD, status=0x1122be0) failed<br>MPIDI_CH3I_Progress(144)..................: handle_sock_op failed<br>
MPIDI_CH3I_Progress_handle_sock_event(175):<br>MPIDU_Socki_handle_read(607)..............: connection closed by peer (set=0,sock=2)<br>aborting job:<br>Fatal error in MPI_Recv: Other MPI error, error stack:<br>MPI_Recv(186).............................: MPI_Recv(buf=0x3178918, count=584, MPI_DOUBLE_PRECISION, src=3, tag=0, MPI_COMM_WORLD, status=0x1122be0) failed<br>
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed<br>MPIDI_CH3I_Progress_handle_sock_event(175):<br>MPIDU_Socki_handle_read(607)..............: connection closed by peer (set=0,sock=4)<br>rank 4 in job 1 master_32935 caused collective abort of all ranks<br>
exit status of rank 4: killed by signal 9<br>[cli_3]: aborting job:<br>Fatal error in MPI_Recv: Other MPI error, error stack:<br>MPI_Recv(186).............................: MPI_Recv(buf=0x3178918, count=584, MPI_DOUBLE_PRECISION, src=2, tag=0, MPI_COMM_WORLD, status=0x1122be0) failed<br>
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed<br>MPIDI_CH3I_Progress_handle_sock_event(175):<br>MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=3,errno=104:Connection reset by peer)<br>
rank 3 in job 1 master_32935 caused collective abort of all ranks<br> exit status of rank 3: killed by signal 9<br><br><br>Can you please tell me the source of this error.<br><br>Regards<br>Suman Vajjala<br>