[mpich-discuss] Error with more processors

Rajeev Thakur thakur at mcs.anl.gov
Fri Dec 12 22:19:41 CST 2008


Can you send us a small test program that demonstrates the error?

Rajeev 

> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of yeliu at abo.fi
> Sent: Friday, December 12, 2008 8:00 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] Error with more processors
> 
> Hi,
>    I'm writing an parallel program using MPI.This 
> program(with master-slave computation) works well with 3 
> processors, but when with
> 4 processors,it has the following error.Does it anyone know 
> how to solve it?
>    By the way,I tried to debug it and found only 2 slaves can 
> receive the msg of 'no work' when I run it with processors 
> <=4 where all the slaves should receive the msg.
> 
> -----------------------------------
> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(186).............................:  
> MPI_Recv(buf=0x7fff82e714ec, count=1, MPI_INT, src=0, tag=1, 
> MPI_COMM_WORLD, status=0x695ce0) failed
> MPIDI_CH3i_Progress_wait(215).............: an error occurred 
> while handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(633)..............: connection 
> failure (set=0,sock=2,errno=104:Connection reset by 
> peer)[cli_3]: aborting job:
> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(186).............................:  
> MPI_Recv(buf=0x7fff82e714ec, count=1, MPI_INT, src=0, tag=1, 
> MPI_COMM_WORLD, status=0x695ce0) failed
> MPIDI_CH3i_Progress_wait(215).............: an error occurred 
> while handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(633)..............: connection 
> failure (set=0,sock=2,errno=104:Connection reset by peer)
> rank 0 in job 118  maximum.cs.abo.fi_37597   caused collective abort  
> of all ranks
>    exit status of rank 0: killed by signal 9
> --------------------------
> 
> 
> Thank you very much!
> 
> 
> Ye Liu
> 
> 
> 
> 




More information about the mpich-discuss mailing list