[MPICH] MPI_File_read_all hanging
Wei-keng Liao
wkliao at ece.northwestern.edu
Fri Feb 1 23:57:09 CST 2008
I have an I/O program hanging on MPI_File_read_all. The code is the
attached C file. It writes 20 3D block-block-block partitioned arrays,
closes the file, re-opens it, and reads the 20 arrays back, also in the
same 3D block pattern. It is similar to the ROMIO 3D test code,
coll_test.c
The error occured when I ran on 64 processes, not less (the machine I ran
has 2 processors per node). The first 20 writes are OK. But the program
hangs at around 10th read. After tracing down to the source, it hangs on
MPI_Waitall(nprocs_recv, requests, statuses);
in function ADIOI_R_Exchange_data(), file ad_read_coll.c .
I am using mpich2-1.0.6p1 on a Linux cluster
2.6.9-42.0.10.EL_lustre-1.4.10.1smp #1 SMP x86_64 x86_64 x86_64 GNU/Linux
gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
Wei-keng
-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
/*----< main() >------------------------------------------------------------*/
int main(int argc, char **argv) {
int i, debug, rank, np, buf_size, ghost_cells, len, ntimes;
int np_dim[3], rank_dim[3], array_of_sizes[3];
int array_of_subsizes[3], array_of_starts[3];
double *buf;
MPI_File fh;
MPI_Info info = MPI_INFO_NULL;
MPI_Datatype file_type, buf_type;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &np);
if (argc != 2) {
fprintf(stderr,"Usage: %s filename\n",argv[0]);
MPI_Finalize(); return 1;
}
debug = 1;
ntimes = 20; /* number of write/read iterations */
ghost_cells = 2; /* on both sides along each dim for local array */
len = 25; /* local array size in each dim */
/* no. processes in each dim */
np_dim[0] = 4; /* Z dim */
np_dim[1] = 4; /* Y dim */
np_dim[2] = 4; /* X dim */
/* check if number of processes is matched */
if (np != np_dim[0]*np_dim[1]*np_dim[2]) {
printf("Error: process number mismatch npz(%d)*npy(%d)*npx(%d)!=total(%d)\n",
np_dim[0],np_dim[1],np_dim[2],np);
MPI_Finalize(); return 1;
}
/* process rank in each dimension */
rank_dim[2] = rank % np_dim[2]; /* X dim */
rank_dim[1] = (rank / np_dim[2]) % np_dim[1]; /* Y dim */
rank_dim[0] = rank / (np_dim[2] * np_dim[1]); /* Z dim */
/* starting coordinates of the subarray in each dimension */
for (i=0; i<3; i++) {
array_of_sizes[i] = len * np_dim[i]; /* global 3D array size */
array_of_subsizes[i] = len; /* sub array size */
array_of_starts[i] = len * rank_dim[i];
}
/* create file type: a 3D block-block-block partitioning pattern */
MPI_Type_create_subarray(3, array_of_sizes, array_of_subsizes,
array_of_starts, MPI_ORDER_FORTRAN,
MPI_DOUBLE, &file_type) ;
MPI_Type_commit(&file_type);
/* prepare write buffer ------------------------------------------------*/
buf_size = 1;
for (i=0; i<3; i++) {
array_of_sizes[i] = ghost_cells + array_of_subsizes[i];
array_of_starts[i] = ghost_cells;
buf_size *= array_of_sizes[i];
}
buf = (double*) malloc(buf_size*sizeof(double));
/* create buffer type: subarray is a 3D array with ghost cells */
MPI_Type_create_subarray(3, array_of_sizes, array_of_subsizes,
array_of_starts, MPI_ORDER_C, MPI_DOUBLE,
&buf_type);
MPI_Type_commit(&buf_type);
/* open the file for WRITE ---------------------------------------------*/
MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_CREATE | MPI_MODE_WRONLY,
info, &fh);
/* set the file view */
MPI_File_set_view(fh, 0, MPI_DOUBLE, file_type, "native", info);
/* MPI collective write */
for (i=0; i<ntimes; i++)
MPI_File_write_all(fh, buf, 1, buf_type, &status);
MPI_File_close(&fh);
/* open the file for READ ---------------------------------------------*/
MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_RDONLY, info, &fh);
/* set the file view */
MPI_File_set_view(fh, 0, MPI_DOUBLE, file_type, "native", info);
/* MPI collective write */
for (i=0; i<ntimes; i++) {
MPI_File_read_all(fh, buf, 1, buf_type, &status);
if (debug) printf("P%-2d: pass READ iteration %2d\n",rank,i);
}
MPI_File_close(&fh);
free(buf);
MPI_Type_free(&file_type);
MPI_Type_free(&buf_type);
MPI_Finalize();
return 0;
}
More information about the mpich-discuss
mailing list