[MPICH] max number of isend/irecv allowed?

Rajeev Thakur thakur at mcs.anl.gov
Fri Feb 15 21:53:07 CST 2008


The error message shows a connection failure. Can you try with the new 1.0.7
rc1?

Rajeev

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Wei-keng Liao
> Sent: Friday, February 15, 2008 3:41 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] max number of isend/irecv allowed?
> 
> 
> Is there a max number of MPI isend/irecv calls allowed per 
> process before a MPI_Wait_all is called?
> 
> I am seeing an error message below when a large number of 
> isend/irecv are used (eg. 512 processes):
> 
>   [cli_53]: aborting job:
>   Fatal error in MPI_Waitall: Other MPI error, error stack:
>   MPI_Waitall(258)............................: 
> MPI_Waitall(count=1024,
>   req_array=0x5f7730, status_array=0x8176c0) failed
>   MPIDI_CH3i_Progress_wait(215)...............: an error 
> occurred while
>   handling an event returned by MPIDU_Sock_Wait()
>   MPIDI_CH3I_Progress_handle_sock_event(779)..:
>   MPIDI_CH3_Sockconn_handle_connect_event(608): [ch3:sock] failed to
>   connnect to remote process
>   MPIDU_Socki_handle_connect(791).............: connection failure
>   (set=0,sock=18,errno=110:(strerror() not found))
> 
>   INTERNAL ERROR: Invalid error class (66) encountered while 
> returning from
>   MPI_Waitall.  Please file a bug report.  No error stack is 
> available.
>   [cli_29]: aborting job:
> 
> The program attached reporduces the error. The error occurs 
> only when running more than 512 processes. (I tested 8 
> processes per node, each node has 2 CPUs). This program is 
> extracted from ADIOI_Calc_others_req(). I found the 
> collective I/O crashed is due to this error. I think this may 
> also relate to the hanging problem I posted earlier but not 
> yet solved.
> 
> I am using mpich2-1.0.6p1. 
> 
> Wei-keng
> 
> 




More information about the mpich-discuss mailing list