[MPICH] a potential bug
Wei-keng Liao
wkliao at ece.northwestern.edu
Fri Oct 26 22:50:41 CDT 2007
In file mpich2-1.0.6/src/mpi/romio/adio/common/ad_aggregate.c, function
ADIOI_Calc_others_req(), lines 403 to 439, two arrays of MPI_Request are
first allocated with ADIOI_Malloc(), used in MPI_Isend()/MPI_Irecv(), and
then in the two following MPI_Waitall()s.
However the two arrays are not initialized to MPI_REQUEST_NULL, but all
elements of the arrays are used in the MPI_Waitall() calls in lines 438
and 439. This is dangerous since ADIOI_Malloc() does not always allocate a
buffer with all zero contents (matching the define of MPI_REQUEST_NULL:
#define MPI_REQUEST_NULL ((MPI_Request)0x2c000000)
in mpich2-1.0.6/src/include/mpi.h.in .
I am running a MPICH2-1.0.2 on Cray and had a run failed with message
indicating the location around this area. I can see this part has not benn
changed since 1.0.2. Please confirm this potential bug.
Either initilaizing to MPI_REQUEST_NULL or using ADIOI_Calloc() can
fix the problem.
Another clever solution can be to combine the two arrays into one; use
variable j as a counter for both loops(i.e. remove line 421, without
resetting j to 0); and use one MPI_Waitall() with arguments of j as the
number of requests and the combined request array.
Wei-keng
More information about the mpich-discuss
mailing list