[mpich-discuss] spawned processes do not shut down

Darius Buntinas buntinas at mcs.anl.gov
Fri Oct 28 13:17:34 CDT 2011


Hi Jon,

I had to pull out my copy of the standard for this one :-).  The standard says (Section 10.5.4 on pg 330) that MPI_Finalize is collective over "connected" processes.  By the definition of "connected" (in the same section), the master and worker processes in your application are still connected, so the worker process might wait until the master process calls MPI_Finalize (in the case of MPICH, it will wait if the two processes have ever communicated).  You can use MPI_Comm_disconnect to disconnect the master and worker before the worker calls MPI_Finalize.

I added an MPI_Comm_disconnect on line 37 and line 80 of your program and it looks like it works as you intend.

-d


On Oct 28, 2011, at 11:18 AM, Jonathan Bishop wrote:

> I am using MPI_Comm_spawn to dynamically run workers. However, when the workers exit they get hung up on MPI_Finalize. Here is a short program which shows the issue...
> 
> It responds to several commands...
> 
> Do
> 
> start
> stop
> 
> and then check how many processes are running - it should be 1, not 2.
> 
> I am using MPICH2 1.4.1-p1.
> 
> Thanks,
> 
> Jon
> 
> #include <sys/types.h>
> #include <unistd.h>
> #include <iostream>
> #include "mpi.h"
> 
> using namespace std;
> 
> 
> main(int argc, char **argv)
> {
>   MPI_Init(&argc, &argv);
>   MPI_Comm parent;
>   MPI_Comm_get_parent(&parent);
> 
>   // Master
>   if (parent == MPI_COMM_NULL) {
>     cout << getpid() << endl;
>     MPI_Comm intercom = MPI_COMM_NULL;
>     while (1) {
>       cout << "Enter: ";
>       string s;
>       cin >> s;
>       if (s == "start") {
> 	if (intercom != MPI_COMM_NULL) {
> 	  cout << "already started" << endl;
> 	  continue;
> 	}
> 	MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercom,  MPI_ERRCODES_IGNORE);
> 	continue;
>       }
>       if (s == "stop") {
> 	if (intercom == MPI_COMM_NULL) {
> 	  cout << "worker not running" << endl;
> 	  continue;
> 	}
> 	MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0, intercom);
> 	intercom = MPI_COMM_NULL;
> //	MPI_Finalize();  // This will allow the workers to die, but then I can not restart them.
> 	continue;
>       }
>       if (s == "exit") {
> 	if (intercom != MPI_COMM_NULL) {
> 	  cout << "need to stop before exit" << endl;
> 	  continue;
> 	}
> 	break;
>       }
>       if (intercom == MPI_COMM_NULL) {
> 	cout << "need to start" << endl;
> 	continue;
>       }
>       MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0, intercom);
>       char buf[1000];
>       MPI_Status status;
>       MPI_Recv(buf, 1000, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, intercom, &status);
>       int count;
>       MPI_Get_count(&status, MPI_CHAR, &count);
>       buf[count] = 0;
>       string t = buf;
>       cout << "worker returned " << t << endl;
>     }
>   }
> 
>   // Worker
>   if (parent != MPI_COMM_NULL) {
>     while (1) {
>       char buf[1000];
>       MPI_Status status;
>       MPI_Recv(buf, 1000, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, parent, &status);
>       int count;
>       MPI_Get_count(&status, MPI_CHAR, &count);
>       buf[count] = 0;
>       string s = buf;
>       if (s == "stop") {
> 	cout << "worker stopping" << endl;
> 	break;
>       }
>       MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0, parent);
>     }
>   }
> 
>   MPI_Finalize();
> }
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list