[mpich-discuss] spawned processes do not shut down

Rajeev Thakur thakur at mcs.anl.gov
Tue Nov 1 08:25:07 CDT 2011


MPI_Comm_disconnect is a collective function, so it needs to be called by all processes in the communicator you pass to the function on both the master and worker side.

Rajeev

On Nov 1, 2011, at 12:56 AM, Jonathan Bishop wrote:

> Just a follow up.
> 
> I discovered that you just need to disconnect from the master side
> only, and not on both the worker and master.
> 
> Jon
> 
> On Fri, Oct 28, 2011 at 8:53 PM, Jonathan Bishop <jbishop.rwc at gmail.com> wrote:
>> Thanks!
>> 
>> On Fri, Oct 28, 2011 at 11:17 AM, Darius Buntinas <buntinas at mcs.anl.gov>
>> wrote:
>>> 
>>> Hi Jon,
>>> 
>>> I had to pull out my copy of the standard for this one :-).  The standard
>>> says (Section 10.5.4 on pg 330) that MPI_Finalize is collective over
>>> "connected" processes.  By the definition of "connected" (in the same
>>> section), the master and worker processes in your application are still
>>> connected, so the worker process might wait until the master process calls
>>> MPI_Finalize (in the case of MPICH, it will wait if the two processes have
>>> ever communicated).  You can use MPI_Comm_disconnect to disconnect the
>>> master and worker before the worker calls MPI_Finalize.
>>> 
>>> I added an MPI_Comm_disconnect on line 37 and line 80 of your program and
>>> it looks like it works as you intend.
>>> 
>>> -d
>>> 
>>> 
>>> On Oct 28, 2011, at 11:18 AM, Jonathan Bishop wrote:
>>> 
>>>> I am using MPI_Comm_spawn to dynamically run workers. However, when the
>>>> workers exit they get hung up on MPI_Finalize. Here is a short program which
>>>> shows the issue...
>>>> 
>>>> It responds to several commands...
>>>> 
>>>> Do
>>>> 
>>>> start
>>>> stop
>>>> 
>>>> and then check how many processes are running - it should be 1, not 2.
>>>> 
>>>> I am using MPICH2 1.4.1-p1.
>>>> 
>>>> Thanks,
>>>> 
>>>> Jon
>>>> 
>>>> #include <sys/types.h>
>>>> #include <unistd.h>
>>>> #include <iostream>
>>>> #include "mpi.h"
>>>> 
>>>> using namespace std;
>>>> 
>>>> 
>>>> main(int argc, char **argv)
>>>> {
>>>>   MPI_Init(&argc, &argv);
>>>>   MPI_Comm parent;
>>>>   MPI_Comm_get_parent(&parent);
>>>> 
>>>>   // Master
>>>>   if (parent == MPI_COMM_NULL) {
>>>>     cout << getpid() << endl;
>>>>     MPI_Comm intercom = MPI_COMM_NULL;
>>>>     while (1) {
>>>>       cout << "Enter: ";
>>>>       string s;
>>>>       cin >> s;
>>>>       if (s == "start") {
>>>>       if (intercom != MPI_COMM_NULL) {
>>>>         cout << "already started" << endl;
>>>>         continue;
>>>>       }
>>>>       MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0,
>>>> MPI_COMM_SELF, &intercom,  MPI_ERRCODES_IGNORE);
>>>>       continue;
>>>>       }
>>>>       if (s == "stop") {
>>>>       if (intercom == MPI_COMM_NULL) {
>>>>         cout << "worker not running" << endl;
>>>>         continue;
>>>>       }
>>>>       MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
>>>> intercom);
>>>>       intercom = MPI_COMM_NULL;
>>>> //    MPI_Finalize();  // This will allow the workers to die, but then I
>>>> can not restart them.
>>>>       continue;
>>>>       }
>>>>       if (s == "exit") {
>>>>       if (intercom != MPI_COMM_NULL) {
>>>>         cout << "need to stop before exit" << endl;
>>>>         continue;
>>>>       }
>>>>       break;
>>>>       }
>>>>       if (intercom == MPI_COMM_NULL) {
>>>>       cout << "need to start" << endl;
>>>>       continue;
>>>>       }
>>>>       MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
>>>> intercom);
>>>>       char buf[1000];
>>>>       MPI_Status status;
>>>>       MPI_Recv(buf, 1000, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG,
>>>> intercom, &status);
>>>>       int count;
>>>>       MPI_Get_count(&status, MPI_CHAR, &count);
>>>>       buf[count] = 0;
>>>>       string t = buf;
>>>>       cout << "worker returned " << t << endl;
>>>>     }
>>>>   }
>>>> 
>>>>   // Worker
>>>>   if (parent != MPI_COMM_NULL) {
>>>>     while (1) {
>>>>       char buf[1000];
>>>>       MPI_Status status;
>>>>       MPI_Recv(buf, 1000, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, parent,
>>>> &status);
>>>>       int count;
>>>>       MPI_Get_count(&status, MPI_CHAR, &count);
>>>>       buf[count] = 0;
>>>>       string s = buf;
>>>>       if (s == "stop") {
>>>>       cout << "worker stopping" << endl;
>>>>       break;
>>>>       }
>>>>       MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
>>>> parent);
>>>>     }
>>>>   }
>>>> 
>>>>   MPI_Finalize();
>>>> }
>>>> 
>>>> _______________________________________________
>>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> 
>>> _______________________________________________
>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list