[mpich-discuss] spawned processes do not shut down

Jonathan Bishop jbishop.rwc at gmail.com
Tue Nov 1 11:24:56 CDT 2011


Thanks. It won't hurt to put it on both sides.

On Tue, Nov 1, 2011 at 9:11 AM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
> It may work with MPICH2 now, but you can't count on this working for other MPIs or even a future MPICH2.
>
> -d
>
>
> On Nov 1, 2011, at 8:25 AM, Rajeev Thakur wrote:
>
>> MPI_Comm_disconnect is a collective function, so it needs to be called by all processes in the communicator you pass to the function on both the master and worker side.
>>
>> Rajeev
>>
>> On Nov 1, 2011, at 12:56 AM, Jonathan Bishop wrote:
>>
>>> Just a follow up.
>>>
>>> I discovered that you just need to disconnect from the master side
>>> only, and not on both the worker and master.
>>>
>>> Jon
>>>
>>> On Fri, Oct 28, 2011 at 8:53 PM, Jonathan Bishop <jbishop.rwc at gmail.com> wrote:
>>>> Thanks!
>>>>
>>>> On Fri, Oct 28, 2011 at 11:17 AM, Darius Buntinas <buntinas at mcs.anl.gov>
>>>> wrote:
>>>>>
>>>>> Hi Jon,
>>>>>
>>>>> I had to pull out my copy of the standard for this one :-).  The standard
>>>>> says (Section 10.5.4 on pg 330) that MPI_Finalize is collective over
>>>>> "connected" processes.  By the definition of "connected" (in the same
>>>>> section), the master and worker processes in your application are still
>>>>> connected, so the worker process might wait until the master process calls
>>>>> MPI_Finalize (in the case of MPICH, it will wait if the two processes have
>>>>> ever communicated).  You can use MPI_Comm_disconnect to disconnect the
>>>>> master and worker before the worker calls MPI_Finalize.
>>>>>
>>>>> I added an MPI_Comm_disconnect on line 37 and line 80 of your program and
>>>>> it looks like it works as you intend.
>>>>>
>>>>> -d
>>>>>
>>>>>
>>>>> On Oct 28, 2011, at 11:18 AM, Jonathan Bishop wrote:
>>>>>
>>>>>> I am using MPI_Comm_spawn to dynamically run workers. However, when the
>>>>>> workers exit they get hung up on MPI_Finalize. Here is a short program which
>>>>>> shows the issue...
>>>>>>
>>>>>> It responds to several commands...
>>>>>>
>>>>>> Do
>>>>>>
>>>>>> start
>>>>>> stop
>>>>>>
>>>>>> and then check how many processes are running - it should be 1, not 2.
>>>>>>
>>>>>> I am using MPICH2 1.4.1-p1.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>> #include <sys/types.h>
>>>>>> #include <unistd.h>
>>>>>> #include <iostream>
>>>>>> #include "mpi.h"
>>>>>>
>>>>>> using namespace std;
>>>>>>
>>>>>>
>>>>>> main(int argc, char **argv)
>>>>>> {
>>>>>>  MPI_Init(&argc, &argv);
>>>>>>  MPI_Comm parent;
>>>>>>  MPI_Comm_get_parent(&parent);
>>>>>>
>>>>>>  // Master
>>>>>>  if (parent == MPI_COMM_NULL) {
>>>>>>    cout << getpid() << endl;
>>>>>>    MPI_Comm intercom = MPI_COMM_NULL;
>>>>>>    while (1) {
>>>>>>      cout << "Enter: ";
>>>>>>      string s;
>>>>>>      cin >> s;
>>>>>>      if (s == "start") {
>>>>>>      if (intercom != MPI_COMM_NULL) {
>>>>>>        cout << "already started" << endl;
>>>>>>        continue;
>>>>>>      }
>>>>>>      MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0,
>>>>>> MPI_COMM_SELF, &intercom,  MPI_ERRCODES_IGNORE);
>>>>>>      continue;
>>>>>>      }
>>>>>>      if (s == "stop") {
>>>>>>      if (intercom == MPI_COMM_NULL) {
>>>>>>        cout << "worker not running" << endl;
>>>>>>        continue;
>>>>>>      }
>>>>>>      MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
>>>>>> intercom);
>>>>>>      intercom = MPI_COMM_NULL;
>>>>>> //    MPI_Finalize();  // This will allow the workers to die, but then I
>>>>>> can not restart them.
>>>>>>      continue;
>>>>>>      }
>>>>>>      if (s == "exit") {
>>>>>>      if (intercom != MPI_COMM_NULL) {
>>>>>>        cout << "need to stop before exit" << endl;
>>>>>>        continue;
>>>>>>      }
>>>>>>      break;
>>>>>>      }
>>>>>>      if (intercom == MPI_COMM_NULL) {
>>>>>>      cout << "need to start" << endl;
>>>>>>      continue;
>>>>>>      }
>>>>>>      MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
>>>>>> intercom);
>>>>>>      char buf[1000];
>>>>>>      MPI_Status status;
>>>>>>      MPI_Recv(buf, 1000, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG,
>>>>>> intercom, &status);
>>>>>>      int count;
>>>>>>      MPI_Get_count(&status, MPI_CHAR, &count);
>>>>>>      buf[count] = 0;
>>>>>>      string t = buf;
>>>>>>      cout << "worker returned " << t << endl;
>>>>>>    }
>>>>>>  }
>>>>>>
>>>>>>  // Worker
>>>>>>  if (parent != MPI_COMM_NULL) {
>>>>>>    while (1) {
>>>>>>      char buf[1000];
>>>>>>      MPI_Status status;
>>>>>>      MPI_Recv(buf, 1000, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, parent,
>>>>>> &status);
>>>>>>      int count;
>>>>>>      MPI_Get_count(&status, MPI_CHAR, &count);
>>>>>>      buf[count] = 0;
>>>>>>      string s = buf;
>>>>>>      if (s == "stop") {
>>>>>>      cout << "worker stopping" << endl;
>>>>>>      break;
>>>>>>      }
>>>>>>      MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
>>>>>> parent);
>>>>>>    }
>>>>>>  }
>>>>>>
>>>>>>  MPI_Finalize();
>>>>>> }
>>>>>>
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>>
>>> _______________________________________________
>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>


More information about the mpich-discuss mailing list