[mpich-discuss] failure on respawn
Jonathan Bishop
jbishop.rwc at gmail.com
Fri Nov 4 10:55:57 CDT 2011
Any takers for this one?
On Wed, Nov 2, 2011 at 9:50 AM, Jonathan Bishop <jbishop.rwc at gmail.com>wrote:
> *Hi,*
> *
> *
> *Here is a short program which shows an MPI crash when multiple
> MPI_Comm_spawn calls are made. Previously, it was found that it is
> necessary to call MPI_Comm_disconnect from both the worker and master
> processes to make sure that the spawned processes actually die.
> Unfortunately, this second issue may be related to that fix --- if I remove
> the disconnects the crash disappears.*
> *
> *
> *Here is the crash message...*
> *
> *
> *
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(392).................:
> MPID_Init(139)........................: channel initialization failed
> MPIDI_CH3_Init(38)....................:
> MPID_nem_init(196)....................:
> MPIDI_CH3I_Seg_commit(366)............:
> MPIU_SHMW_Hnd_deserialize(324)........:
> MPIU_SHMW_Seg_open(863)...............:
> MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or
> directory
>
> <repeated a number of times>
>
> Thanks,
>
> Jon
> *
>
>
> #include <iostream>
> #include "mpi.h"
>
> using namespace std;
>
> const int BUFSIZE = 1000;
> const int NWORKER = 10;
> const int NPASS = 10;
>
> int main(int argc, char **argv)
> {
> MPI_Init(&argc, &argv);
> MPI_Comm parent;
> MPI_Comm_get_parent(&parent);
>
> // Master
> if (parent == MPI_COMM_NULL) {
> for (int i = 0; i < NPASS; i++) {
> cout << "pass " << i << " =============" << endl;
> MPI_Comm intercom = MPI_COMM_NULL;
> cout << "spawn " << NWORKER << endl;
> MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, NWORKER, MPI_INFO_NULL, 0,
> MPI_COMM_SELF, &intercom, MPI_ERRCODES_IGNORE);
> for (int worker = 0; worker < NWORKER; worker++) {
> cout << "stop " << worker << endl;
> char buf[BUFSIZE];
> MPI_Send(buf, 0, MPI_CHAR, worker, 0, intercom);
> }
> cout << "disconnnect" << endl;
> MPI_Comm_disconnect(&intercom);
> intercom = MPI_COMM_NULL;
> }
> }
>
> // Worker
> if (parent != MPI_COMM_NULL) {
> char buf[BUFSIZE];
> MPI_Status status;
> MPI_Recv(buf, BUFSIZE, MPI_CHAR, 0, MPI_ANY_TAG, parent, &status);
> MPI_Comm_disconnect(&parent);
> }
>
> MPI_Finalize();
>
> return 0;
> }
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111104/7eb66a3f/attachment-0001.htm>
More information about the mpich-discuss
mailing list