[MPICH] Possible Race condition between Test() and Cancel
David Minor
david-m at orbotech.com
Wed Feb 1 00:00:12 CST 2006
Hi Rajeev,
I tried that earlier but it didn't help because you still have the
problem that between the time you check for MPI_REQUEST_NULL and the
time you do cancel the wait can complete. The check needs to be under
the general mutex for wait and test. I snuck into the source code for
cancel.c and added the following lines and the problem is fixed for my
application. But of course I didn't fix it properly and probably caused
other bugs. I just wanted to see if indeed that was the problem.
79:
if (*request == MPI_REQUEST_NULL) {
mpi_errno = MPI_SUCCESS;
goto fn_exit;
}
if (*request_ptr->cc_ptr == 0) {
mpi_errno = MPI_SUCCESS;
goto fn_exit;
}
Is there any reason why this could or should not be fixed?
David
-----Original Message-----
From: Rajeev Thakur [mailto:thakur at mcs.anl.gov]
Sent: Tuesday, January 31, 2006 7:30 PM
To: David Minor; mpich-discuss at mcs.anl.gov
Subject: RE: [MPICH] Possible Race condition between Test() and Cancel
If the request is completed by a test or wait, it is set to
MPI_REQUEST_NULL. See if adding an "if (request != MPI_REQUEST_NULL)"
around
the MPI_Cancel helps.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of David Minor
> Sent: Tuesday, January 31, 2006 1:22 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] Possible Race condition between Test() and Cancel
>
> There appears to be a problem with MPI_Cancel. At least under
> Red Hat 9
> with the the g++ 3.4.3 compiler.
>
> If you Cancel a completed receive request, you will get an
> MPI abort or
> seg fault.
> But if you Test() the request before calling cancel on it there is
> always the possibility that between the Test() and the Cancel() the
> request will be completed thus causing an abort. What is the
> solution?
> Shouldn't Cancel() simply return an error if the request is already
> completed?
>
> My specific problem is:
>
> I'w waiting with WaitAll on a set of receive requests. I want to wait
> until either 1) They all complete or 2) another thread
> decides to cancel
> the requests.
>
> The problem is that the thread that cancels the requests has no way of
> assuring that it doesn't call Cancel() on an already
> completed request.
>
> Please advise,
>
> Regards,
> David Minor Orbotech
>
>
More information about the mpich-discuss
mailing list