[MPICH] a potential bug
Wei-keng Liao
wkliao at ece.northwestern.edu
Mon Oct 29 21:42:51 CDT 2007
Rajeev,
In your new ad_aggregate.c, since j is already a counter, I would replace
lines 430 to 438:
statuses = (MPI_Status *) ADIOI_Malloc((1 + 2* \
(count_my_req_procs+count_others_req_procs)) * \
sizeof(MPI_Status));
/* +1 to avoid a 0-size malloc */
MPI_Waitall(j, requests, statuses);
ADIOI_Free(requests);
ADIOI_Free(statuses);
with the followings:
if (j > 0) {
statuses = (MPI_Status *) ADIOI_Malloc(j * sizeof(MPI_Status));
MPI_Waitall(j, requests, statuses);
ADIOI_Free(statuses);
}
ADIOI_Free(requests);
Wei-keng
On Mon, 29 Oct 2007, Rajeev Thakur wrote:
> Wei-keng,
> Thanks for pointing this out. I have fixed it as you suggested in
> the last paragraph. Attached is the new ad_aggregate.c. Can you test it out?
>
> Thanks,
> Rajeev
>
>
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov
> > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Wei-keng Liao
> > Sent: Friday, October 26, 2007 10:51 PM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: [MPICH] a potential bug
> >
> >
> > In file
> > mpich2-1.0.6/src/mpi/romio/adio/common/ad_aggregate.c, function
> > ADIOI_Calc_others_req(), lines 403 to 439, two arrays of
> > MPI_Request are
> > first allocated with ADIOI_Malloc(), used in
> > MPI_Isend()/MPI_Irecv(), and
> > then in the two following MPI_Waitall()s.
> >
> > However the two arrays are not initialized to
> > MPI_REQUEST_NULL, but all
> > elements of the arrays are used in the MPI_Waitall() calls in
> > lines 438
> > and 439. This is dangerous since ADIOI_Malloc() does not
> > always allocate a
> > buffer with all zero contents (matching the define of
> > MPI_REQUEST_NULL:
> > #define MPI_REQUEST_NULL ((MPI_Request)0x2c000000)
> > in mpich2-1.0.6/src/include/mpi.h.in .
> >
> > I am running a MPICH2-1.0.2 on Cray and had a run failed with message
> > indicating the location around this area. I can see this part
> > has not benn
> > changed since 1.0.2. Please confirm this potential bug.
> >
> > Either initilaizing to MPI_REQUEST_NULL or using ADIOI_Calloc() can
> > fix the problem.
> >
> > Another clever solution can be to combine the two arrays into
> > one; use
> > variable j as a counter for both loops(i.e. remove line 421, without
> > resetting j to 0); and use one MPI_Waitall() with arguments
> > of j as the
> > number of requests and the combined request array.
> >
> > Wei-keng
> >
> >
>
More information about the mpich-discuss
mailing list