[mpich-discuss] Problem with mpich2-1.0.5.3

Jayesh Krishna jayesh at mcs.anl.gov
Tue Jul 28 09:13:58 CDT 2009


Hi,
 The version of MPICH2 that you are using, 1.0.5p3, is old. Try the latest
stable release, 1.1.1, of MPICH2
(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
nloads) and let us know if you still have problems.
  Please provide us a test case to reproduce the problem, if possible, if
the problem persists.
 
Regards,
Jayesh

  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Suman Vajjala
Sent: Tuesday, July 28, 2009 12:44 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] Problem with mpich2-1.0.5.3


Hi,

   I have a problem when running codes with mpi. I compile and run the
code with mpich 2-1.0.5p3 and after sometime the code gives this error - 

[cli_2]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b58a0,
count=584, MPI_DOUBLE_PRECISION, src=3, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=2,errno=104:Connection reset by peer)
[cli_0]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b7d70,
count=584, MPI_DOUBLE_PRECISION, src=1, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=4)
[cli_1]: [cli_4]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b6b08,
count=584, MPI_DOUBLE_PRECISION, src=2, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=2)
aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x3178918,
count=584, MPI_DOUBLE_PRECISION, src=3, tag=0, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=4)
rank 4 in job 1  master_32935   caused collective abort of all ranks
  exit status of rank 4: killed by signal 9
[cli_3]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x3178918,
count=584, MPI_DOUBLE_PRECISION, src=2, tag=0, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=3,errno=104:Connection reset by peer)
rank 3 in job 1  master_32935   caused collective abort of all ranks
  exit status of rank 3: killed by signal 9
[cli_2]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b58a0,
count=584, MPI_DOUBLE_PRECISION, src=3, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=2,errno=104:Connection reset by peer)
[cli_0]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b7d70,
count=584, MPI_DOUBLE_PRECISION, src=1, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=4)
[cli_1]: [cli_4]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x31b6b08,
count=584, MPI_DOUBLE_PRECISION, src=2, tag=1, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=2)
aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x3178918,
count=584, MPI_DOUBLE_PRECISION, src=3, tag=0, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(607)..............: connection closed by peer
(set=0,sock=4)
rank 4 in job 1  master_32935   caused collective abort of all ranks
  exit status of rank 4: killed by signal 9
[cli_3]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................: MPI_Recv(buf=0x3178918,
count=584, MPI_DOUBLE_PRECISION, src=2, tag=0, MPI_COMM_WORLD,
status=0x1122be0) failed
MPIDI_CH3I_Progress(144)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(175):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=3,errno=104:Connection reset by peer)
rank 3 in job 1  master_32935   caused collective abort of all ranks
  exit status of rank 3: killed by signal 9


Can you please tell me the source of this error.

Regards
Suman Vajjala

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090728/849b7ac0/attachment.htm>


More information about the mpich-discuss mailing list