[mpich-discuss] Howto use MPI_Comm_merge together with MPI_Comm_spawn
Nick Radcliffe
nradclif at cray.com
Thu Feb 16 09:38:19 CST 2012
The parent and child's call to MPI_Intercomm_merge have to be distinct. The child should use 'parentcomm' as its intercommunicator argument, and the child should use 'intercomm' as its intercommunicator argument. Also, the flag used should be different, e.g., the parent could use 0 and the child 1, or vice versa. This flag determines how the ranks of each half of the intercommunicator map into ranks for the intracommunicator.
________________________________
From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] on behalf of Umit [umitcanyilmaz at gmail.com]
Sent: Thursday, February 16, 2012 9:19 AM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Howto use MPI_Comm_merge together with MPI_Comm_spawn
Hello Nick,
Thank you for your e-mail.
If I call MPI_Intercomm_merge with all processes like this:
#define NUM_SPAWNS 4
double timer;
int i;
char str[100];
int main( int argc, char *argv[] )
{
MPI_Comm parentcomm, intercomm;
MPI_Comm comm;
MPI_Init( &argc, &argv );
MPI_Comm_get_parent( &parentcomm );
int np = NUM_SPAWNS;
if (parentcomm == MPI_COMM_NULL)
{
int errcodes[np];
MPI_Comm_spawn( "/home/umit/Desktop/merge/./a.out", MPI_ARGV_NULL, np, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm, errcodes );
}
else
{
printf("I'm the spawned.\n");
}
MPI_Intercomm_merge( intercomm, 1, &comm );
MPI_Finalize();
return 0;
}
I am getting this time follgende error:
umit at ubuntu:~/Desktop/merge$ mpirun -np 1 ./a.out
I'm the spawned.
Fatal error in MPI_Intercomm_merge: Invalid communicator, error stack:
MPI_Intercomm_merge(288): MPI_Intercomm_merge(comm=0x331ff4, high=1, newintracomm=0xbf9d0720) failed
MPI_Intercomm_merge(93).: Invalid communicator
I'm the spawned.
Fatal error in MPI_Intercomm_merge: Invalid communicator, error stack:
MPI_Intercomm_merge(288): MPI_Intercomm_merge(comm=0xc71ff4, high=1, newintracomm=0xbffec690) failed
MPI_Intercomm_merge(93).: Invalid communicator
rank 3 in job 56 ubuntu_38267 caused collective abort of all ranks
exit status of rank 3: killed by signal 9
rank 0 in job 56 ubuntu_38267 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
Best Regards,
On 16 February 2012 15:56, Nick Radcliffe <nradclif at cray.com<mailto:nradclif at cray.com>> wrote:
One problem is that the spawned child is not calling MPI_INTERCOMM_MERGE. The child needs to call the merge function in the 'else' part of your 'if (parentcomm == MPI_COMM_NULL)'.
-Nick
________________________________
From: mpich-discuss-bounces at mcs.anl.gov<mailto:mpich-discuss-bounces at mcs.anl.gov> [mpich-discuss-bounces at mcs.anl.gov<mailto:mpich-discuss-bounces at mcs.anl.gov>] on behalf of Umit [umitcanyilmaz at gmail.com<mailto:umitcanyilmaz at gmail.com>]
Sent: Thursday, February 16, 2012 7:06 AM
To: mpich-discuss at mcs.anl.gov<mailto:mpich-discuss at mcs.anl.gov>
Subject: [mpich-discuss] Howto use MPI_Comm_merge together with MPI_Comm_spawn
Hello All,
Can anyone tell me what is wrong with this simple code:
#define NUM_SPAWNS 4
double timer;
int i;
char str[100];
int main( int argc, char *argv[] )
{
MPI_Comm parentcomm, intercomm;
MPI_Comm comm, scomm;
MPI_Init( &argc, &argv );
MPI_Comm_get_parent( &parentcomm );
int np = NUM_SPAWNS;
int size;
MPI_Comm_size( MPI_COMM_WORLD , &size );
if (parentcomm == MPI_COMM_NULL)
{
scomm = MPI_COMM_WORLD;
int errcodes[np];
MPI_Comm_spawn( "/home/test/Desktop/merge/./a.out", MPI_ARGV_NULL, np, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm, errcodes );
MPI_Intercomm_merge( intercomm, 1, &comm );
}
else
{
printf("I'm the spawned.\n");
}
MPI_Finalize();
return 0;
}
I called MPI_Intercomm_merge outside of if statement but I got the same error.
Spawn is successfull. I have especially tested it. If I try to merge, i got the following error:
test at ubuntu:~/Desktop/merge$ mpirun -np 1 ./a.out
I'm the spawned.
I'm the spawned.
I'm the spawned.
I'm the spawned.
Fatal error in MPI_Intercomm_merge: Other MPI error, error stack:
MPI_Intercomm_merge(288)..........: MPI_Intercomm_merge(comm=0x84000000, high=1, newintracomm=0xbf97ff40) failed
MPI_Intercomm_merge(263)..........:
MPIR_Get_contextid(639)...........:
MPI_Allreduce(773)................: MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0xbf97fd18, count=64, MPI_INT, MPI_BAND, comm=0x84000002) failed
MPIR_Allreduce(289)...............:
MPIC_Sendrecv(161)................:
MPIC_Wait(513)....................:
MPIDI_CH3I_Progress(150)..........:
MPID_nem_mpich2_blocking_recv(948):
MPID_nem_tcp_connpoll(1720).......:
state_commrdy_handler(1556).......:
MPID_nem_tcp_recv_handler(1446)...: socket closed
rank 0 in job 21 ubuntu_38267 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
Thanks In Advance,
_______________________________________________
mpich-discuss mailing list mpich-discuss at mcs.anl.gov<mailto:mpich-discuss at mcs.anl.gov>
To manage subscription options or unsubscribe:
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120216/547ca49f/attachment.htm>
More information about the mpich-discuss
mailing list