[MPICH] Possible Race condition between Test() and Cancel

David Minor david-m at orbotech.com
Wed Feb 1 00:00:12 CST 2006


Hi Rajeev,
I tried that earlier but it didn't help because you still have the
problem that between the time you check for MPI_REQUEST_NULL and the
time you do cancel the wait can complete. The check needs to be under
the general mutex for wait and test. I snuck into the source code for
cancel.c and added the following lines and the problem is fixed for my
application. But of course I didn't fix it properly and probably caused
other bugs. I just wanted to see if indeed that was the problem.

79:
    if (*request == MPI_REQUEST_NULL) {
        mpi_errno = MPI_SUCCESS;
        goto fn_exit;

    }
    if (*request_ptr->cc_ptr == 0) {
        mpi_errno = MPI_SUCCESS;
        goto fn_exit;

    }

Is there any reason why this could or should not be fixed?

David




-----Original Message-----
From: Rajeev Thakur [mailto:thakur at mcs.anl.gov] 
Sent: Tuesday, January 31, 2006 7:30 PM
To: David Minor; mpich-discuss at mcs.anl.gov
Subject: RE: [MPICH] Possible Race condition between Test() and Cancel

If the request is completed by a test or wait, it is set to
MPI_REQUEST_NULL. See if adding an "if (request != MPI_REQUEST_NULL)"
around
the MPI_Cancel helps.

Rajeev
 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of David Minor
> Sent: Tuesday, January 31, 2006 1:22 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] Possible Race condition between Test() and Cancel
> 
> There appears to be a problem with MPI_Cancel. At least under 
> Red Hat 9
> with the the g++ 3.4.3 compiler.
> 
> If you Cancel a completed receive request, you will get an 
> MPI abort or
> seg fault.
> But if you Test() the request before calling cancel on it there is
> always the possibility that between the Test() and the Cancel() the
> request will be completed thus causing an abort.  What is the 
> solution?
> Shouldn't Cancel() simply return an error if the request is already
> completed?
> 
> My specific problem is:
> 
> I'w waiting with WaitAll on a set of receive requests. I want to wait
> until either 1) They all complete or 2) another thread 
> decides to cancel
> the requests.
> 
> The problem is that the thread that cancels the requests has no way of
> assuring that it doesn't call Cancel() on an already 
> completed request.
> 
> Please advise,
> 
> Regards,
> David Minor Orbotech
> 
> 




More information about the mpich-discuss mailing list