[MPICH] max number of isend/irecv allowed?

Wei-keng Liao wkliao at ece.northwestern.edu
Fri Feb 15 15:40:51 CST 2008


Is there a max number of MPI isend/irecv calls allowed per process before
a MPI_Wait_all is called?

I am seeing an error message below when a large number of isend/irecv are 
used (eg. 512 processes):

  [cli_53]: aborting job:
  Fatal error in MPI_Waitall: Other MPI error, error stack:
  MPI_Waitall(258)............................: MPI_Waitall(count=1024, 
  req_array=0x5f7730, status_array=0x8176c0) failed
  MPIDI_CH3i_Progress_wait(215)...............: an error occurred while 
  handling an event returned by MPIDU_Sock_Wait()
  MPIDI_CH3I_Progress_handle_sock_event(779)..:
  MPIDI_CH3_Sockconn_handle_connect_event(608): [ch3:sock] failed to 
  connnect to remote process
  MPIDU_Socki_handle_connect(791).............: connection failure 
  (set=0,sock=18,errno=110:(strerror() not found))

  INTERNAL ERROR: Invalid error class (66) encountered while returning from
  MPI_Waitall.  Please file a bug report.  No error stack is available.
  [cli_29]: aborting job:

The program attached reporduces the error. The error occurs only when 
running more than 512 processes. (I tested 8 processes per node, each node 
has 2 CPUs). This program is extracted from ADIOI_Calc_others_req(). I 
found the collective I/O crashed is due to this error. I think this may 
also relate to the hanging problem I posted earlier but not yet solved.

I am using mpich2-1.0.6p1. 

Wei-keng

-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>

/*----< main() >------------------------------------------------------------*/
int main(int argc, char **argv) {
    int           i, j, rank, np, debug;
    int          *send_count, *recv_count;
    char        **s_buf, **r_buf;
    MPI_Request  *requests;
    MPI_Status   *statuses;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
    if (rank == 0) printf("Testing isend-irecv np=%d\n",np);

    send_count = (int*) malloc(np * sizeof(np));
    recv_count = (int*) malloc(np * sizeof(np));
    for (i=0; i<np; i++)
        send_count[i] = rand() % np;

    MPI_Alltoall(send_count, 1, MPI_INT,
                 recv_count, 1, MPI_INT, MPI_COMM_WORLD);

    s_buf = (char**) malloc(np * sizeof(char*));
    r_buf = (char**) malloc(np * sizeof(char*));
    for (i=0; i<np; i++) {
        if (send_count[i] > 0) s_buf[i] = (char*)malloc(send_count[i]);
        if (recv_count[i] > 0) r_buf[i] = (char*)malloc(recv_count[i]);
    }

    requests = (MPI_Request*) malloc(2*np*sizeof(MPI_Request));

    j = 0;
    for (i=0; i<np; i++)
        if (recv_count[i] > 0)
            MPI_Irecv(r_buf[i], recv_count[i], MPI_CHAR, i, i+rank,
                      MPI_COMM_WORLD, &requests[j++]);
    for (i=0; i<np; i++)
        if (send_count[i] > 0)
            MPI_Isend(s_buf[i], send_count[i], MPI_CHAR, i, i+rank,
                      MPI_COMM_WORLD, &requests[j++]);

    if (j) {
        statuses = (MPI_Status *) malloc(j * sizeof(MPI_Status));
        MPI_Waitall(j, requests, statuses);
        free(statuses);
    }

    MPI_Finalize();
    return 0;
}



More information about the mpich-discuss mailing list