[MPICH] Possible Race condition between Test() and Cancel
Rajeev Thakur
thakur at mcs.anl.gov
Wed Feb 1 13:30:33 CST 2006
This fix should probably be ok.
Rajeev
> -----Original Message-----
> From: David Minor [mailto:david-m at orbotech.com]
> Sent: Wednesday, February 01, 2006 12:00 AM
> To: Rajeev Thakur; mpich-discuss at mcs.anl.gov
> Subject: RE: [MPICH] Possible Race condition between Test() and Cancel
>
> Hi Rajeev,
> I tried that earlier but it didn't help because you still have the
> problem that between the time you check for MPI_REQUEST_NULL and the
> time you do cancel the wait can complete. The check needs to be under
> the general mutex for wait and test. I snuck into the source code for
> cancel.c and added the following lines and the problem is fixed for my
> application. But of course I didn't fix it properly and
> probably caused
> other bugs. I just wanted to see if indeed that was the problem.
>
> 79:
> if (*request == MPI_REQUEST_NULL) {
> mpi_errno = MPI_SUCCESS;
> goto fn_exit;
>
> }
> if (*request_ptr->cc_ptr == 0) {
> mpi_errno = MPI_SUCCESS;
> goto fn_exit;
>
> }
>
> Is there any reason why this could or should not be fixed?
>
> David
>
>
>
>
> -----Original Message-----
> From: Rajeev Thakur [mailto:thakur at mcs.anl.gov]
> Sent: Tuesday, January 31, 2006 7:30 PM
> To: David Minor; mpich-discuss at mcs.anl.gov
> Subject: RE: [MPICH] Possible Race condition between Test() and Cancel
>
> If the request is completed by a test or wait, it is set to
> MPI_REQUEST_NULL. See if adding an "if (request != MPI_REQUEST_NULL)"
> around
> the MPI_Cancel helps.
>
> Rajeev
>
>
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov
> > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of David Minor
> > Sent: Tuesday, January 31, 2006 1:22 AM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: [MPICH] Possible Race condition between Test() and Cancel
> >
> > There appears to be a problem with MPI_Cancel. At least under
> > Red Hat 9
> > with the the g++ 3.4.3 compiler.
> >
> > If you Cancel a completed receive request, you will get an
> > MPI abort or
> > seg fault.
> > But if you Test() the request before calling cancel on it there is
> > always the possibility that between the Test() and the Cancel() the
> > request will be completed thus causing an abort. What is the
> > solution?
> > Shouldn't Cancel() simply return an error if the request is already
> > completed?
> >
> > My specific problem is:
> >
> > I'w waiting with WaitAll on a set of receive requests. I
> want to wait
> > until either 1) They all complete or 2) another thread
> > decides to cancel
> > the requests.
> >
> > The problem is that the thread that cancels the requests
> has no way of
> > assuring that it doesn't call Cancel() on an already
> > completed request.
> >
> > Please advise,
> >
> > Regards,
> > David Minor Orbotech
> >
> >
>
>
More information about the mpich-discuss
mailing list