[mpich-discuss] spawned processes do not shut down
Darius Buntinas
buntinas at mcs.anl.gov
Tue Nov 1 11:11:49 CDT 2011
It may work with MPICH2 now, but you can't count on this working for other MPIs or even a future MPICH2.
-d
On Nov 1, 2011, at 8:25 AM, Rajeev Thakur wrote:
> MPI_Comm_disconnect is a collective function, so it needs to be called by all processes in the communicator you pass to the function on both the master and worker side.
>
> Rajeev
>
> On Nov 1, 2011, at 12:56 AM, Jonathan Bishop wrote:
>
>> Just a follow up.
>>
>> I discovered that you just need to disconnect from the master side
>> only, and not on both the worker and master.
>>
>> Jon
>>
>> On Fri, Oct 28, 2011 at 8:53 PM, Jonathan Bishop <jbishop.rwc at gmail.com> wrote:
>>> Thanks!
>>>
>>> On Fri, Oct 28, 2011 at 11:17 AM, Darius Buntinas <buntinas at mcs.anl.gov>
>>> wrote:
>>>>
>>>> Hi Jon,
>>>>
>>>> I had to pull out my copy of the standard for this one :-). The standard
>>>> says (Section 10.5.4 on pg 330) that MPI_Finalize is collective over
>>>> "connected" processes. By the definition of "connected" (in the same
>>>> section), the master and worker processes in your application are still
>>>> connected, so the worker process might wait until the master process calls
>>>> MPI_Finalize (in the case of MPICH, it will wait if the two processes have
>>>> ever communicated). You can use MPI_Comm_disconnect to disconnect the
>>>> master and worker before the worker calls MPI_Finalize.
>>>>
>>>> I added an MPI_Comm_disconnect on line 37 and line 80 of your program and
>>>> it looks like it works as you intend.
>>>>
>>>> -d
>>>>
>>>>
>>>> On Oct 28, 2011, at 11:18 AM, Jonathan Bishop wrote:
>>>>
>>>>> I am using MPI_Comm_spawn to dynamically run workers. However, when the
>>>>> workers exit they get hung up on MPI_Finalize. Here is a short program which
>>>>> shows the issue...
>>>>>
>>>>> It responds to several commands...
>>>>>
>>>>> Do
>>>>>
>>>>> start
>>>>> stop
>>>>>
>>>>> and then check how many processes are running - it should be 1, not 2.
>>>>>
>>>>> I am using MPICH2 1.4.1-p1.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jon
>>>>>
>>>>> #include <sys/types.h>
>>>>> #include <unistd.h>
>>>>> #include <iostream>
>>>>> #include "mpi.h"
>>>>>
>>>>> using namespace std;
>>>>>
>>>>>
>>>>> main(int argc, char **argv)
>>>>> {
>>>>> MPI_Init(&argc, &argv);
>>>>> MPI_Comm parent;
>>>>> MPI_Comm_get_parent(&parent);
>>>>>
>>>>> // Master
>>>>> if (parent == MPI_COMM_NULL) {
>>>>> cout << getpid() << endl;
>>>>> MPI_Comm intercom = MPI_COMM_NULL;
>>>>> while (1) {
>>>>> cout << "Enter: ";
>>>>> string s;
>>>>> cin >> s;
>>>>> if (s == "start") {
>>>>> if (intercom != MPI_COMM_NULL) {
>>>>> cout << "already started" << endl;
>>>>> continue;
>>>>> }
>>>>> MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0,
>>>>> MPI_COMM_SELF, &intercom, MPI_ERRCODES_IGNORE);
>>>>> continue;
>>>>> }
>>>>> if (s == "stop") {
>>>>> if (intercom == MPI_COMM_NULL) {
>>>>> cout << "worker not running" << endl;
>>>>> continue;
>>>>> }
>>>>> MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
>>>>> intercom);
>>>>> intercom = MPI_COMM_NULL;
>>>>> // MPI_Finalize(); // This will allow the workers to die, but then I
>>>>> can not restart them.
>>>>> continue;
>>>>> }
>>>>> if (s == "exit") {
>>>>> if (intercom != MPI_COMM_NULL) {
>>>>> cout << "need to stop before exit" << endl;
>>>>> continue;
>>>>> }
>>>>> break;
>>>>> }
>>>>> if (intercom == MPI_COMM_NULL) {
>>>>> cout << "need to start" << endl;
>>>>> continue;
>>>>> }
>>>>> MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
>>>>> intercom);
>>>>> char buf[1000];
>>>>> MPI_Status status;
>>>>> MPI_Recv(buf, 1000, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG,
>>>>> intercom, &status);
>>>>> int count;
>>>>> MPI_Get_count(&status, MPI_CHAR, &count);
>>>>> buf[count] = 0;
>>>>> string t = buf;
>>>>> cout << "worker returned " << t << endl;
>>>>> }
>>>>> }
>>>>>
>>>>> // Worker
>>>>> if (parent != MPI_COMM_NULL) {
>>>>> while (1) {
>>>>> char buf[1000];
>>>>> MPI_Status status;
>>>>> MPI_Recv(buf, 1000, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, parent,
>>>>> &status);
>>>>> int count;
>>>>> MPI_Get_count(&status, MPI_CHAR, &count);
>>>>> buf[count] = 0;
>>>>> string s = buf;
>>>>> if (s == "stop") {
>>>>> cout << "worker stopping" << endl;
>>>>> break;
>>>>> }
>>>>> MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
>>>>> parent);
>>>>> }
>>>>> }
>>>>>
>>>>> MPI_Finalize();
>>>>> }
>>>>>
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>>
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list