[mpich-discuss] Error with more processors

yeliu at abo.fi yeliu at abo.fi
Fri Dec 12 08:00:21 CST 2008


Hi,
   I'm writing an parallel program using MPI.This program(with  
master-slave computation) works well with 3 processors, but when with  
4 processors,it has the following error.Does it anyone know how to  
solve it?
   By the way,I tried to debug it and found only 2 slaves can receive  
the msg of 'no work' when I run it with processors <=4 where all the  
slaves should receive the msg.

-----------------------------------
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................:  
MPI_Recv(buf=0x7fff82e714ec, count=1, MPI_INT, src=0, tag=1,  
MPI_COMM_WORLD, status=0x695ce0) failed
MPIDI_CH3i_Progress_wait(215).............: an error occurred while  
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(633)..............: connection failure  
(set=0,sock=2,errno=104:Connection reset by peer)[cli_3]: aborting job:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............................:  
MPI_Recv(buf=0x7fff82e714ec, count=1, MPI_INT, src=0, tag=1,  
MPI_COMM_WORLD, status=0x695ce0) failed
MPIDI_CH3i_Progress_wait(215).............: an error occurred while  
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(633)..............: connection failure  
(set=0,sock=2,errno=104:Connection reset by peer)
rank 0 in job 118  maximum.cs.abo.fi_37597   caused collective abort  
of all ranks
   exit status of rank 0: killed by signal 9
--------------------------


Thank you very much!


Ye Liu






More information about the mpich-discuss mailing list