[mpich-discuss] failure on respawn

Jonathan Bishop jbishop.rwc at gmail.com
Wed Nov 2 11:50:30 CDT 2011


*Hi,*
*
*
*Here is a short program which shows an MPI crash when multiple
MPI_Comm_spawn calls are made. Previously, it was found that it is
necessary to call MPI_Comm_disconnect from both the worker and master
processes to make sure that the spawned processes actually die.
Unfortunately, this second issue may be related to that fix --- if I remove
the disconnects the crash disappears.*
*
*
*Here is the crash message...*
*
*
*
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(392).................:
MPID_Init(139)........................: channel initialization failed
MPIDI_CH3_Init(38)....................:
MPID_nem_init(196)....................:
MPIDI_CH3I_Seg_commit(366)............:
MPIU_SHMW_Hnd_deserialize(324)........:
MPIU_SHMW_Seg_open(863)...............:
MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or
directory

<repeated a number of times>

Thanks,

Jon
*


#include <iostream>
#include "mpi.h"

using namespace std;

const int BUFSIZE = 1000;
const int NWORKER = 10;
const int NPASS = 10;

int main(int argc, char **argv)
{
  MPI_Init(&argc, &argv);
  MPI_Comm parent;
  MPI_Comm_get_parent(&parent);

  // Master
  if (parent == MPI_COMM_NULL) {
    for (int i = 0; i < NPASS; i++) {
      cout << "pass " << i << " =============" << endl;
      MPI_Comm intercom = MPI_COMM_NULL;
      cout << "spawn " << NWORKER << endl;
      MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, NWORKER, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &intercom, MPI_ERRCODES_IGNORE);
      for (int worker = 0; worker < NWORKER; worker++) {
cout << "stop " << worker << endl;
char buf[BUFSIZE];
MPI_Send(buf, 0, MPI_CHAR, worker, 0, intercom);
      }
      cout << "disconnnect" << endl;
      MPI_Comm_disconnect(&intercom);
      intercom = MPI_COMM_NULL;
    }
  }

  // Worker
  if (parent != MPI_COMM_NULL) {
    char buf[BUFSIZE];
    MPI_Status status;
    MPI_Recv(buf, BUFSIZE, MPI_CHAR, 0, MPI_ANY_TAG, parent, &status);
    MPI_Comm_disconnect(&parent);
  }

  MPI_Finalize();

  return 0;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111102/24d10db0/attachment.htm>


More information about the mpich-discuss mailing list