[mpich-discuss] Problem with mpich2-1.0.5.3

Suman Vajjala suman.geek at gmail.com
Tue Jul 28 00:43:41 CDT 2009


Hi,

   I have a problem when running codes with mpi. I compile and run the code
with mpich 2-1.0.5p3 and after sometime the code gives this error -

[cli_2]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b58a0,
count=584, MPI_DOUBLE_PRECISION, src=3, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=2,errno=104:Connection reset by peer)
[cli_0]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b7d70,
count=584, MPI_DOUBLE_PRECISION, src=1, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=4)
[cli_1]: [cli_4]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b6b08,
count=584, MPI_DOUBLE_PRECISION, src=2, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=2)
aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x3178918,
count=584, MPI_DOUBLE_PRECISION, src=3, tag=0, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=4)
rank 4 in job 1  master_32935   caused collective abort of all ranks
  exit status of rank 4: killed by signal 9
[cli_3]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x3178918,
count=584, MPI_DOUBLE_PRECISION, src=2, tag=0, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=3,errno=104:Connection reset by peer)
rank 3 in job 1  master_32935   caused collective abort of all ranks
  exit status of rank 3: killed by signal 9
[cli_2]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b58a0,
count=584, MPI_DOUBLE_PRECISION, src=3, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=2,errno=104:Connection reset by peer)
[cli_0]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b7d70,
count=584, MPI_DOUBLE_PRECISION, src=1, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=4)
[cli_1]: [cli_4]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b6b08,
count=584, MPI_DOUBLE_PRECISION, src=2, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=2)
aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x3178918,
count=584, MPI_DOUBLE_PRECISION, src=3, tag=0, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=4)
rank 4 in job 1  master_32935   caused collective abort of all ranks
  exit status of rank 4: killed by signal 9
[cli_3]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x3178918,
count=584, MPI_DOUBLE_PRECISION, src=2, tag=0, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=3,errno=104:Connection reset by peer)
rank 3 in job 1  master_32935   caused collective abort of all ranks
  exit status of rank 3: killed by signal 9


Can you please tell me the source of this error.

Regards
Suman Vajjala
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090728/64293ec3/attachment.htm>


More information about the mpich-discuss mailing list