[MPICH] MPI_File_read_all hanging

Wei-keng Liao wkliao at ece.northwestern.edu
Fri Feb 1 23:57:09 CST 2008

I have an I/O program hanging on MPI_File_read_all. The code is the 
attached C file. It writes 20 3D block-block-block partitioned arrays, 
closes the file, re-opens it, and reads the 20 arrays back, also in the 
same 3D block pattern. It is similar to the ROMIO 3D test code, 

The error occured when I ran on 64 processes, not less (the machine I ran 
has 2 processors per node). The first 20 writes are OK. But the program 
hangs at around 10th read. After tracing down to the source, it hangs on
           MPI_Waitall(nprocs_recv, requests, statuses);
in function ADIOI_R_Exchange_data(), file ad_read_coll.c .

I am using mpich2-1.0.6p1 on a Linux cluster 
2.6.9-42.0.10.EL_lustre- #1 SMP x86_64 x86_64 x86_64 GNU/Linux

gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)

-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

/*----< main() >------------------------------------------------------------*/
int main(int argc, char **argv) {
    int          i, debug, rank, np, buf_size, ghost_cells, len, ntimes;
    int          np_dim[3], rank_dim[3], array_of_sizes[3];
    int          array_of_subsizes[3], array_of_starts[3];
    double      *buf;
    MPI_File     fh;
    MPI_Info     info = MPI_INFO_NULL;
    MPI_Datatype file_type, buf_type;
    MPI_Status   status;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &np);

    if (argc != 2) {
        fprintf(stderr,"Usage: %s filename\n",argv[0]);
        MPI_Finalize(); return 1;
    debug = 1;

    ntimes      = 20; /* number of write/read iterations */
    ghost_cells = 2;  /* on both sides along each dim for local array */
    len         = 25; /* local array size in each dim */
    /* no. processes in each dim */
    np_dim[0] = 4;    /* Z dim */
    np_dim[1] = 4;    /* Y dim */
    np_dim[2] = 4;    /* X dim */

    /* check if number of processes is matched */
    if (np != np_dim[0]*np_dim[1]*np_dim[2]) {
        printf("Error: process number mismatch npz(%d)*npy(%d)*npx(%d)!=total(%d)\n",
        MPI_Finalize(); return 1;

    /* process rank in each dimension */
    rank_dim[2] =  rank %  np_dim[2];                 /* X dim */
    rank_dim[1] = (rank /  np_dim[2]) % np_dim[1];    /* Y dim */
    rank_dim[0] =  rank / (np_dim[2]  * np_dim[1]);   /* Z dim */

    /* starting coordinates of the subarray in each dimension */
    for (i=0; i<3; i++) {
        array_of_sizes[i]    = len * np_dim[i]; /* global 3D array size */
        array_of_subsizes[i] = len;             /* sub array size */
        array_of_starts[i]   = len * rank_dim[i];

    /* create file type: a 3D block-block-block partitioning pattern */
    MPI_Type_create_subarray(3, array_of_sizes, array_of_subsizes,
                             array_of_starts, MPI_ORDER_FORTRAN,
                             MPI_DOUBLE, &file_type) ;

    /* prepare write buffer ------------------------------------------------*/
    buf_size = 1;
    for (i=0; i<3; i++) {
        array_of_sizes[i]  = ghost_cells + array_of_subsizes[i];
        array_of_starts[i] = ghost_cells;
        buf_size *= array_of_sizes[i];
    buf = (double*) malloc(buf_size*sizeof(double));

    /* create buffer type: subarray is a 3D array with ghost cells */
    MPI_Type_create_subarray(3, array_of_sizes, array_of_subsizes,
                             array_of_starts, MPI_ORDER_C, MPI_DOUBLE,

    /* open the file for WRITE ---------------------------------------------*/
                  info, &fh);

    /* set the file view */
    MPI_File_set_view(fh, 0, MPI_DOUBLE, file_type, "native", info);

    /* MPI collective write */
    for (i=0; i<ntimes; i++)
        MPI_File_write_all(fh, buf, 1, buf_type, &status);


    /* open the file for READ ---------------------------------------------*/
    MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_RDONLY, info, &fh);

    /* set the file view */
    MPI_File_set_view(fh, 0, MPI_DOUBLE, file_type, "native", info);

    /* MPI collective write */
    for (i=0; i<ntimes; i++) {
        MPI_File_read_all(fh, buf, 1, buf_type, &status);
        if (debug) printf("P%-2d: pass READ  iteration %2d\n",rank,i);

    return 0;

More information about the mpich-discuss mailing list