[MPICH] Possible Race condition between Test() and Cancel

Rajeev Thakur thakur at mcs.anl.gov
Wed Feb 1 13:30:33 CST 2006


This fix should probably be ok.

Rajeev 

> -----Original Message-----
> From: David Minor [mailto:david-m at orbotech.com] 
> Sent: Wednesday, February 01, 2006 12:00 AM
> To: Rajeev Thakur; mpich-discuss at mcs.anl.gov
> Subject: RE: [MPICH] Possible Race condition between Test() and Cancel
> 
> Hi Rajeev,
> I tried that earlier but it didn't help because you still have the
> problem that between the time you check for MPI_REQUEST_NULL and the
> time you do cancel the wait can complete. The check needs to be under
> the general mutex for wait and test. I snuck into the source code for
> cancel.c and added the following lines and the problem is fixed for my
> application. But of course I didn't fix it properly and 
> probably caused
> other bugs. I just wanted to see if indeed that was the problem.
> 
> 79:
>     if (*request == MPI_REQUEST_NULL) {
>         mpi_errno = MPI_SUCCESS;
>         goto fn_exit;
> 
>     }
>     if (*request_ptr->cc_ptr == 0) {
>         mpi_errno = MPI_SUCCESS;
>         goto fn_exit;
> 
>     }
> 
> Is there any reason why this could or should not be fixed?
> 
> David
> 
> 
> 
> 
> -----Original Message-----
> From: Rajeev Thakur [mailto:thakur at mcs.anl.gov] 
> Sent: Tuesday, January 31, 2006 7:30 PM
> To: David Minor; mpich-discuss at mcs.anl.gov
> Subject: RE: [MPICH] Possible Race condition between Test() and Cancel
> 
> If the request is completed by a test or wait, it is set to
> MPI_REQUEST_NULL. See if adding an "if (request != MPI_REQUEST_NULL)"
> around
> the MPI_Cancel helps.
> 
> Rajeev
>  
> 
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov 
> > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of David Minor
> > Sent: Tuesday, January 31, 2006 1:22 AM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: [MPICH] Possible Race condition between Test() and Cancel
> > 
> > There appears to be a problem with MPI_Cancel. At least under 
> > Red Hat 9
> > with the the g++ 3.4.3 compiler.
> > 
> > If you Cancel a completed receive request, you will get an 
> > MPI abort or
> > seg fault.
> > But if you Test() the request before calling cancel on it there is
> > always the possibility that between the Test() and the Cancel() the
> > request will be completed thus causing an abort.  What is the 
> > solution?
> > Shouldn't Cancel() simply return an error if the request is already
> > completed?
> > 
> > My specific problem is:
> > 
> > I'w waiting with WaitAll on a set of receive requests. I 
> want to wait
> > until either 1) They all complete or 2) another thread 
> > decides to cancel
> > the requests.
> > 
> > The problem is that the thread that cancels the requests 
> has no way of
> > assuring that it doesn't call Cancel() on an already 
> > completed request.
> > 
> > Please advise,
> > 
> > Regards,
> > David Minor Orbotech
> > 
> > 
> 
> 




More information about the mpich-discuss mailing list