[mpich-discuss] spawned processes do not shut down
Jonathan Bishop
jbishop.rwc at gmail.com
Fri Oct 28 22:53:47 CDT 2011
Thanks!
On Fri, Oct 28, 2011 at 11:17 AM, Darius Buntinas <buntinas at mcs.anl.gov>wrote:
> Hi Jon,
>
> I had to pull out my copy of the standard for this one :-). The standard
> says (Section 10.5.4 on pg 330) that MPI_Finalize is collective over
> "connected" processes. By the definition of "connected" (in the same
> section), the master and worker processes in your application are still
> connected, so the worker process might wait until the master process calls
> MPI_Finalize (in the case of MPICH, it will wait if the two processes have
> ever communicated). You can use MPI_Comm_disconnect to disconnect the
> master and worker before the worker calls MPI_Finalize.
>
> I added an MPI_Comm_disconnect on line 37 and line 80 of your program and
> it looks like it works as you intend.
>
> -d
>
>
> On Oct 28, 2011, at 11:18 AM, Jonathan Bishop wrote:
>
> > I am using MPI_Comm_spawn to dynamically run workers. However, when the
> workers exit they get hung up on MPI_Finalize. Here is a short program which
> shows the issue...
> >
> > It responds to several commands...
> >
> > Do
> >
> > start
> > stop
> >
> > and then check how many processes are running - it should be 1, not 2.
> >
> > I am using MPICH2 1.4.1-p1.
> >
> > Thanks,
> >
> > Jon
> >
> > #include <sys/types.h>
> > #include <unistd.h>
> > #include <iostream>
> > #include "mpi.h"
> >
> > using namespace std;
> >
> >
> > main(int argc, char **argv)
> > {
> > MPI_Init(&argc, &argv);
> > MPI_Comm parent;
> > MPI_Comm_get_parent(&parent);
> >
> > // Master
> > if (parent == MPI_COMM_NULL) {
> > cout << getpid() << endl;
> > MPI_Comm intercom = MPI_COMM_NULL;
> > while (1) {
> > cout << "Enter: ";
> > string s;
> > cin >> s;
> > if (s == "start") {
> > if (intercom != MPI_COMM_NULL) {
> > cout << "already started" << endl;
> > continue;
> > }
> > MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0,
> MPI_COMM_SELF, &intercom, MPI_ERRCODES_IGNORE);
> > continue;
> > }
> > if (s == "stop") {
> > if (intercom == MPI_COMM_NULL) {
> > cout << "worker not running" << endl;
> > continue;
> > }
> > MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
> intercom);
> > intercom = MPI_COMM_NULL;
> > // MPI_Finalize(); // This will allow the workers to die, but then I
> can not restart them.
> > continue;
> > }
> > if (s == "exit") {
> > if (intercom != MPI_COMM_NULL) {
> > cout << "need to stop before exit" << endl;
> > continue;
> > }
> > break;
> > }
> > if (intercom == MPI_COMM_NULL) {
> > cout << "need to start" << endl;
> > continue;
> > }
> > MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
> intercom);
> > char buf[1000];
> > MPI_Status status;
> > MPI_Recv(buf, 1000, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG,
> intercom, &status);
> > int count;
> > MPI_Get_count(&status, MPI_CHAR, &count);
> > buf[count] = 0;
> > string t = buf;
> > cout << "worker returned " << t << endl;
> > }
> > }
> >
> > // Worker
> > if (parent != MPI_COMM_NULL) {
> > while (1) {
> > char buf[1000];
> > MPI_Status status;
> > MPI_Recv(buf, 1000, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, parent,
> &status);
> > int count;
> > MPI_Get_count(&status, MPI_CHAR, &count);
> > buf[count] = 0;
> > string s = buf;
> > if (s == "stop") {
> > cout << "worker stopping" << endl;
> > break;
> > }
> > MPI_Send(const_cast<char*>(s.c_str()), s.size(), MPI_CHAR, 0, 0,
> parent);
> > }
> > }
> >
> > MPI_Finalize();
> > }
> >
> > _______________________________________________
> > mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111028/13da94ed/attachment-0001.htm>
More information about the mpich-discuss
mailing list