[mpich-discuss] failure on respawn

Jonathan Bishop jbishop.rwc at gmail.com
Fri Nov 4 10:55:57 CDT 2011


Any takers for this one?

On Wed, Nov 2, 2011 at 9:50 AM, Jonathan Bishop <jbishop.rwc at gmail.com>wrote:

> *Hi,*
> *
> *
> *Here is a short program which shows an MPI crash when multiple
> MPI_Comm_spawn calls are made. Previously, it was found that it is
> necessary to call MPI_Comm_disconnect from both the worker and master
> processes to make sure that the spawned processes actually die.
> Unfortunately, this second issue may be related to that fix --- if I remove
> the disconnects the crash disappears.*
> *
> *
> *Here is the crash message...*
> *
> *
> *
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(392).................:
> MPID_Init(139)........................: channel initialization failed
> MPIDI_CH3_Init(38)....................:
> MPID_nem_init(196)....................:
> MPIDI_CH3I_Seg_commit(366)............:
> MPIU_SHMW_Hnd_deserialize(324)........:
> MPIU_SHMW_Seg_open(863)...............:
> MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or
> directory
>
> <repeated a number of times>
>
> Thanks,
>
> Jon
> *
>
>
> #include <iostream>
> #include "mpi.h"
>
> using namespace std;
>
> const int BUFSIZE = 1000;
> const int NWORKER = 10;
> const int NPASS = 10;
>
> int main(int argc, char **argv)
> {
>   MPI_Init(&argc, &argv);
>   MPI_Comm parent;
>   MPI_Comm_get_parent(&parent);
>
>   // Master
>   if (parent == MPI_COMM_NULL) {
>     for (int i = 0; i < NPASS; i++) {
>       cout << "pass " << i << " =============" << endl;
>       MPI_Comm intercom = MPI_COMM_NULL;
>       cout << "spawn " << NWORKER << endl;
>       MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, NWORKER, MPI_INFO_NULL, 0,
> MPI_COMM_SELF, &intercom, MPI_ERRCODES_IGNORE);
>       for (int worker = 0; worker < NWORKER; worker++) {
>  cout << "stop " << worker << endl;
> char buf[BUFSIZE];
>  MPI_Send(buf, 0, MPI_CHAR, worker, 0, intercom);
>       }
>       cout << "disconnnect" << endl;
>       MPI_Comm_disconnect(&intercom);
>       intercom = MPI_COMM_NULL;
>     }
>   }
>
>   // Worker
>   if (parent != MPI_COMM_NULL) {
>     char buf[BUFSIZE];
>     MPI_Status status;
>     MPI_Recv(buf, BUFSIZE, MPI_CHAR, 0, MPI_ANY_TAG, parent, &status);
>     MPI_Comm_disconnect(&parent);
>   }
>
>   MPI_Finalize();
>
>   return 0;
> }
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111104/7eb66a3f/attachment-0001.htm>


More information about the mpich-discuss mailing list