[MPICH] a potential bug

Wei-keng Liao wkliao at ece.northwestern.edu
Fri Oct 26 22:50:41 CDT 2007


In file mpich2-1.0.6/src/mpi/romio/adio/common/ad_aggregate.c, function 
ADIOI_Calc_others_req(), lines 403 to 439, two arrays of MPI_Request are 
first allocated with ADIOI_Malloc(), used in MPI_Isend()/MPI_Irecv(), and 
then in the two following MPI_Waitall()s.

However the two arrays are not initialized to MPI_REQUEST_NULL, but all 
elements of the arrays are used in the MPI_Waitall() calls in lines 438 
and 439. This is dangerous since ADIOI_Malloc() does not always allocate a 
buffer with all zero contents (matching the define of MPI_REQUEST_NULL:
   #define MPI_REQUEST_NULL   ((MPI_Request)0x2c000000)
in mpich2-1.0.6/src/include/mpi.h.in .

I am running a MPICH2-1.0.2 on Cray and had a run failed with message 
indicating the location around this area. I can see this part has not benn 
changed since 1.0.2. Please confirm this potential bug.

Either initilaizing to MPI_REQUEST_NULL or using ADIOI_Calloc() can 
fix the problem. 

Another clever solution can be to combine the two arrays into one; use 
variable j as a counter for both loops(i.e. remove line 421, without 
resetting j to 0); and use one MPI_Waitall() with arguments of j as the 
number of requests and the combined request array.

Wei-keng




More information about the mpich-discuss mailing list