[mpich-discuss] intercomm merge then intercomm create bug/misunderstanding?

Brad Penoff penoff at cs.ubc.ca
Fri Jun 27 03:19:43 CDT 2008


Greetings,

I have been using MPICH2 1.0.7 in a large application that makes use
of intercommunicators.  I have had no problems when using
communicators all derived from the same MPI_COMM_WORLD, but we are
trying to extend our application to make use of dynamic processes and
issues are arriving when making use of the resulting communicators
from a spawn.  I was hoping I could get some help from this list
because I can't tell if it's a bug of the middleware or user error on
my part.

I've reduced the problematic part of our application to a simpler
example.  I've put this source code at the following link, but
describe what I'm doing in the next paragraph:
http://cs.ubc.ca/~penoff/spawn_merge_create_simple.c

In this simple example, I have an application launched by mpirun with
any number of processes. Each process calls spawn if that process
itself is not the result of a spawn.  The result is an
intercommunicator returned from spawn on the original processes, and
obtained from the get_parent call on the spawned processes.  What the
application is trying to do is merge each side of the
intercommunicator created from spawn into one intracommunicator; from
this new global intracommunicator (which includes the original and
spawned processes), I'm trying to create a new intercommunicator
between two subsets of processes, rank 0 of the new global
intracommunicator on one side and the rest of the processes on the
other.  However, when I do this, I keep getting this error that I'm
having difficulty understanding:

mpirun -np 3 ./spawn_merge_create_simple 2      #  the arg 2 is how
many procs to spawn
...
Fatal error in MPI_Intercomm_create: Invalid buffer pointer, error stack:
MPI_Intercomm_create(580): MPI_Intercomm_create(comm=0x84000003,
local_leader=0, comm=0x84000002, remote_leader=0, tag=66,
newintercomm=0x80cf858) failed
(unknown)(): Invalid buffer pointer[cli_1]: aborting job:
Fatal error in MPI_Intercomm_create: Invalid buffer pointer, error stack:
MPI_Intercomm_create(580): MPI_Intercomm_create(comm=0x84000003,
local_leader=0, comm=0x84000002, remote_leader=0, tag=66,
newintercomm=0x80cf858) failed
(unknown)(): Invalid buffer pointer
rank 1 in job 71  pugwash2_33211   caused collective abort of all ranks
  exit status of rank 1: killed by signal 9
No matching pg foung for id = 2054728494
Fatal error in MPI_Intercomm_create: Internal MPI error!, error stack:
MPI_Intercomm_create(580).: MPI_Intercomm_create(comm=0x84000003,
local_leader=0, comm=0x84000002, remote_leader=1, tag=66,
newintercomm=0x80cf858) failed
MPID_GPID_ToLpidArray(373): Internal MPI error: Unknown gpid
(-1209130304)-1209130304[cli_0]: aborting job:
Fatal error in MPI_Intercomm_create: Internal MPI error!, error stack:
MPI_Intercomm_create(580).: MPI_Intercomm_create(comm=0x84000003,
local_leader=0, comm=0x84000002, remote_leader=1, tag=66,
newintercomm=0x80cf858) failed
MPID_GPID_ToLpidArray(373): Internal MPI error: Unknown gpid
(-1209130304)-1209130304


Do you think this is a middleware bug or user error (and any hints on
my misunderstanding if the latter?) ?

Thanks!
brad




More information about the mpich-discuss mailing list