[mpich-discuss] Error with more processors
Anthony Chan
chan at mcs.anl.gov
Fri Dec 12 12:24:54 CST 2008
Need more info about how the job is launched ?
The important line in the error message is
connection failure (set=0,sock=2,errno=104:Connection reset by peer)
Did you use mpd ? use mpdboot ? Or did you check the cluster
with mpdcheck ?
A.Chan
----- yeliu at abo.fi wrote:
> Hi,
> I'm writing an parallel program using MPI.This program(with
> master-slave computation) works well with 3 processors, but when with
>
> 4 processors,it has the following error.Does it anyone know how to
> solve it?
> By the way,I tried to debug it and found only 2 slaves can receive
>
> the msg of 'no work' when I run it with processors <=4 where all the
>
> slaves should receive the msg.
>
> -----------------------------------
> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(186).............................:
> MPI_Recv(buf=0x7fff82e714ec, count=1, MPI_INT, src=0, tag=1,
> MPI_COMM_WORLD, status=0x695ce0) failed
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(633)..............: connection failure
> (set=0,sock=2,errno=104:Connection reset by peer)[cli_3]: aborting
> job:
> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(186).............................:
> MPI_Recv(buf=0x7fff82e714ec, count=1, MPI_INT, src=0, tag=1,
> MPI_COMM_WORLD, status=0x695ce0) failed
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(633)..............: connection failure
> (set=0,sock=2,errno=104:Connection reset by peer)
> rank 0 in job 118 maximum.cs.abo.fi_37597 caused collective abort
>
> of all ranks
> exit status of rank 0: killed by signal 9
> --------------------------
>
>
> Thank you very much!
>
>
> Ye Liu
More information about the mpich-discuss
mailing list