[MPICH] max number of isend/irecv allowed?
Wei-keng Liao
wkliao at ece.northwestern.edu
Fri Feb 15 15:40:51 CST 2008
Is there a max number of MPI isend/irecv calls allowed per process before
a MPI_Wait_all is called?
I am seeing an error message below when a large number of isend/irecv are
used (eg. 512 processes):
[cli_53]: aborting job:
Fatal error in MPI_Waitall: Other MPI error, error stack:
MPI_Waitall(258)............................: MPI_Waitall(count=1024,
req_array=0x5f7730, status_array=0x8176c0) failed
MPIDI_CH3i_Progress_wait(215)...............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(779)..:
MPIDI_CH3_Sockconn_handle_connect_event(608): [ch3:sock] failed to
connnect to remote process
MPIDU_Socki_handle_connect(791).............: connection failure
(set=0,sock=18,errno=110:(strerror() not found))
INTERNAL ERROR: Invalid error class (66) encountered while returning from
MPI_Waitall. Please file a bug report. No error stack is available.
[cli_29]: aborting job:
The program attached reporduces the error. The error occurs only when
running more than 512 processes. (I tested 8 processes per node, each node
has 2 CPUs). This program is extracted from ADIOI_Calc_others_req(). I
found the collective I/O crashed is due to this error. I think this may
also relate to the hanging problem I posted earlier but not yet solved.
I am using mpich2-1.0.6p1.
Wei-keng
-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>
/*----< main() >------------------------------------------------------------*/
int main(int argc, char **argv) {
int i, j, rank, np, debug;
int *send_count, *recv_count;
char **s_buf, **r_buf;
MPI_Request *requests;
MPI_Status *statuses;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &np);
if (rank == 0) printf("Testing isend-irecv np=%d\n",np);
send_count = (int*) malloc(np * sizeof(np));
recv_count = (int*) malloc(np * sizeof(np));
for (i=0; i<np; i++)
send_count[i] = rand() % np;
MPI_Alltoall(send_count, 1, MPI_INT,
recv_count, 1, MPI_INT, MPI_COMM_WORLD);
s_buf = (char**) malloc(np * sizeof(char*));
r_buf = (char**) malloc(np * sizeof(char*));
for (i=0; i<np; i++) {
if (send_count[i] > 0) s_buf[i] = (char*)malloc(send_count[i]);
if (recv_count[i] > 0) r_buf[i] = (char*)malloc(recv_count[i]);
}
requests = (MPI_Request*) malloc(2*np*sizeof(MPI_Request));
j = 0;
for (i=0; i<np; i++)
if (recv_count[i] > 0)
MPI_Irecv(r_buf[i], recv_count[i], MPI_CHAR, i, i+rank,
MPI_COMM_WORLD, &requests[j++]);
for (i=0; i<np; i++)
if (send_count[i] > 0)
MPI_Isend(s_buf[i], send_count[i], MPI_CHAR, i, i+rank,
MPI_COMM_WORLD, &requests[j++]);
if (j) {
statuses = (MPI_Status *) malloc(j * sizeof(MPI_Status));
MPI_Waitall(j, requests, statuses);
free(statuses);
}
MPI_Finalize();
return 0;
}
More information about the mpich-discuss
mailing list