[MPICH] a potential bug

Wei-keng Liao wkliao at ece.northwestern.edu
Mon Oct 29 21:42:51 CDT 2007


Rajeev,

In your new ad_aggregate.c, since j is already a counter, I would replace 
lines 430 to 438:

    statuses = (MPI_Status *) ADIOI_Malloc((1 + 2* \
                   (count_my_req_procs+count_others_req_procs)) * \
                       sizeof(MPI_Status));
/* +1 to avoid a 0-size malloc */

    MPI_Waitall(j, requests, statuses);

    ADIOI_Free(requests);
    ADIOI_Free(statuses);

with the followings:
    if (j > 0) {
        statuses = (MPI_Status *) ADIOI_Malloc(j * sizeof(MPI_Status));
        MPI_Waitall(j, requests, statuses);
        ADIOI_Free(statuses);
    }
    ADIOI_Free(requests);


Wei-keng

On Mon, 29 Oct 2007, Rajeev Thakur wrote:

> Wei-keng,
>          Thanks for pointing this out. I have fixed it as you suggested in
> the last paragraph. Attached is the new ad_aggregate.c. Can you test it out?
> 
> Thanks,
> Rajeev
>   
> 
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov 
> > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Wei-keng Liao
> > Sent: Friday, October 26, 2007 10:51 PM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: [MPICH] a potential bug
> > 
> > 
> > In file 
> > mpich2-1.0.6/src/mpi/romio/adio/common/ad_aggregate.c, function 
> > ADIOI_Calc_others_req(), lines 403 to 439, two arrays of 
> > MPI_Request are 
> > first allocated with ADIOI_Malloc(), used in 
> > MPI_Isend()/MPI_Irecv(), and 
> > then in the two following MPI_Waitall()s.
> > 
> > However the two arrays are not initialized to 
> > MPI_REQUEST_NULL, but all 
> > elements of the arrays are used in the MPI_Waitall() calls in 
> > lines 438 
> > and 439. This is dangerous since ADIOI_Malloc() does not 
> > always allocate a 
> > buffer with all zero contents (matching the define of 
> > MPI_REQUEST_NULL:
> >    #define MPI_REQUEST_NULL   ((MPI_Request)0x2c000000)
> > in mpich2-1.0.6/src/include/mpi.h.in .
> > 
> > I am running a MPICH2-1.0.2 on Cray and had a run failed with message 
> > indicating the location around this area. I can see this part 
> > has not benn 
> > changed since 1.0.2. Please confirm this potential bug.
> > 
> > Either initilaizing to MPI_REQUEST_NULL or using ADIOI_Calloc() can 
> > fix the problem. 
> > 
> > Another clever solution can be to combine the two arrays into 
> > one; use 
> > variable j as a counter for both loops(i.e. remove line 421, without 
> > resetting j to 0); and use one MPI_Waitall() with arguments 
> > of j as the 
> > number of requests and the combined request array.
> > 
> > Wei-keng
> > 
> > 
> 




More information about the mpich-discuss mailing list