[mpich-discuss] disable-auto-cleanup send/receive example
Darius Buntinas
buntinas at mcs.anl.gov
Wed Nov 2 14:46:51 CDT 2011
When a process fails, all pending send and receive operations with the failed process will complete with an error. In addition, all wildcard (i.e., MPI_ANY_SOURCE) receives will also complete with an error. The idea being that the failed process may be the intended sender for the message expected by the wildcard receive, so to avoid a hung process it's safest to kill those receives as well.
However, there's a race condition between when a receive is posted and when the failure is detected and processed, so some of the receives posted after the failure also complete with an error. We're working with the MPI Forum to standardize the exact behavior in such cases to eliminate the race, but it's still in the works.
I'll look into this and see if there's a quick fix I can apply to address this.
In your program, if I added a
sleep(1);
MPI_Iprobe(0, 0, MPI_COMM_WORLD, &flag, MPI_STATUS_IGNORE);
in the error case of the receive, it did run as you expected. There's still a race, it's just less likely to hit in your example.
-d
On Nov 2, 2011, at 1:21 PM, Rob Stewart wrote:
> Hi,
>
> I am trying to understand and use the --disable-auto-cleanup flag in mpich2.
>
> I have written a very simple example in C with mpi.
>
> Here is the code:
>
> http://pastebin.com/daHDtEBA
>
> Here's the output of a successful run:
> ---
> $ mpiexec --disable-auto-cleanup -machinefile hosts -n 10 delayed-hello
> Hello World from process 1 running on machine1
> Hello World from process 2 running on machine2
> Hello World from process 3 running on machine3
> Hello World from process 4 running on machine4
> Hello World from process 5 running on machine5
> Hello World from process 6 running on machine6
> Hello World from process 7 running on machine7
> Hello World from process 8 running on machine8
> Hello World from process 9 running on machine9
> Ready
>
>
> Now, here's what happens when I run it, and killing a process on a node.
> Note that I kill the node with rank 3 (process 3). So the mpi_send_msg() has been executed, and the rank0 machine has received and printed a "Hello World" line for this message...
>
> ---
> $ mpiexec --disable-auto-cleanup -machinefile hosts -n 10 delayed-hello
> Hello World from process 1 running on machine1
> Hello World from process 2 running on machine2
> Hello World from process 3 running on machine3
> Hello World from process 4 running on machine4
> Hello World from process 5 running on machine5
> Hello World from process 6 running on machine6
> Error in MPI_Recv!
> Hello World from process 6 running on machine6
> Error in MPI_Recv!
> Hello World from process 6 running on machine6
> Error in MPI_Recv!
> Hello World from process 6 running on machine6
> Ready
> Process 9: Error in MPI_Send!
> Process 7: Error in MPI_Send!
> Process 8: Error in MPI_Send!
> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
>
>
> Perhaps naively, I had thought that because there would be no further communication with the process I had killed, it wouldn't make any difference to the runtime behaviour. But it did. I had also hoped that even if you killed a process *before* communication with a node, that mpich2 would just skip the communication attempt and continue, due to --disable-auto-cleanup. So if I were to kill process 7 in my example, I was hoping to see:
>
> ---
> $ mpiexec --disable-auto-cleanup -machinefile hosts -n 10 delayed-hello
> Hello World from process 1 running on machine1
> Hello World from process 2 running on machine2
> Hello World from process 3 running on machine3
> Hello World from process 4 running on machine4
> Hello World from process 5 running on machine5
> Hello World from process 6 running on machine6
> Hello World from process 8 running on machine8
> Hello World from process 9 running on machine9
> Ready
>
>
> Maybe, it is because that the MPI_COMM_WORLD is no longer valid. Initially, the world had 10 nodes, but when I kill a process, it has 9. So each subsequent attempt from live nodes to execute:
>
> MPI_Send(msg, length, MPI_CHAR, 0, tag, MPI_COMM_WORLD);
>
> So, is there any way in mpich2 to "refresh" MPI_COMM_WORLD, without a full MPI_INIT ?
>
> Something like:
>
> err = MPI_Send(msg, length, MPI_CHAR, 0, tag, MPI_COMM_WORLD);
> if (err != MPI_SUCCESS) {
> /* refresh_comm_world()
> retry_send_msg() */
> }
>
>
> Other than this, I cannot see an obvious way to take advantage of the --disable-auto-cleanup flag. Are there any canonical examples of C code, using this flag?
>
>
> --
> Rob Stewart
>
>
> --
> Heriot-Watt University is a Scottish charity
> registered under charity number SC000278.
>
> Heriot-Watt University is the Sunday Times
> Scottish University of the Year 2011-2012
>
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list