[mpich-discuss] Howto use MPI_Comm_merge together with MPI_Comm_spawn

Rajeev Thakur thakur at mcs.anl.gov
Thu Feb 16 09:31:24 CST 2012


On the children, Intercomm_merge should use parentcomm.

On Feb 16, 2012, at 9:19 AM, Umit wrote:

> Hello Nick, 
> 
> Thank you for your e-mail. 
> 
> If  I call MPI_Intercomm_merge with all processes like this:
> 
> #define NUM_SPAWNS 4
> double timer;
> int i; 
> char str[100]; 
> 
> int main( int argc, char *argv[] )
> {
>     MPI_Comm parentcomm, intercomm;
>     MPI_Comm comm;
>     MPI_Init( &argc, &argv );
>     MPI_Comm_get_parent( &parentcomm );
>     int np = NUM_SPAWNS;     
>  
>     if (parentcomm == MPI_COMM_NULL)
>     {
>         int errcodes[np];
>         MPI_Comm_spawn( "/home/umit/Desktop/merge/./a.out", MPI_ARGV_NULL, np, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm, errcodes );
>     }
>     else
>     {
>         printf("I'm the spawned.\n");
>     }
>     MPI_Intercomm_merge( intercomm, 1, &comm );
>     MPI_Finalize();
>     return 0;
> }
> 
> 
> I am getting this time follgende error:
> 
> umit at ubuntu:~/Desktop/merge$ mpirun -np 1 ./a.out
> I'm the spawned.
> Fatal error in MPI_Intercomm_merge: Invalid communicator, error stack:
> MPI_Intercomm_merge(288): MPI_Intercomm_merge(comm=0x331ff4, high=1, newintracomm=0xbf9d0720) failed
> MPI_Intercomm_merge(93).: Invalid communicator
> I'm the spawned.
> Fatal error in MPI_Intercomm_merge: Invalid communicator, error stack:
> MPI_Intercomm_merge(288): MPI_Intercomm_merge(comm=0xc71ff4, high=1, newintracomm=0xbffec690) failed
> MPI_Intercomm_merge(93).: Invalid communicator
> rank 3 in job 56  ubuntu_38267   caused collective abort of all ranks
>   exit status of rank 3: killed by signal 9 
> rank 0 in job 56  ubuntu_38267   caused collective abort of all ranks
>   exit status of rank 0: killed by signal 9 
> 
> Best Regards, 
> 
> 
> 
> 
> On 16 February 2012 15:56, Nick Radcliffe <nradclif at cray.com> wrote:
> One problem is that the spawned child is not calling MPI_INTERCOMM_MERGE. The child needs to call the merge function in the 'else' part of your 'if (parentcomm == MPI_COMM_NULL)'.
> 
> -Nick
> From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] on behalf of Umit [umitcanyilmaz at gmail.com]
> Sent: Thursday, February 16, 2012 7:06 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] Howto use MPI_Comm_merge together with MPI_Comm_spawn
> 
> Hello All, 
> 
> Can anyone tell me what is wrong with this simple code:
> 
> #define NUM_SPAWNS 4
> double timer;
> int i; 
> char str[100]; 
> int main( int argc, char *argv[] )
> {
>     MPI_Comm parentcomm, intercomm;
>     MPI_Comm comm, scomm;
>     MPI_Init( &argc, &argv );
>     MPI_Comm_get_parent( &parentcomm );
>     int np = NUM_SPAWNS;     
>     int size; 
>     MPI_Comm_size( MPI_COMM_WORLD , &size );
>     if (parentcomm == MPI_COMM_NULL)
>     {
>        scomm = MPI_COMM_WORLD; 
>        int errcodes[np];
>        MPI_Comm_spawn( "/home/test/Desktop/merge/./a.out", MPI_ARGV_NULL, np, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm, errcodes );
>        MPI_Intercomm_merge( intercomm, 1, &comm );
>     }
>     else
>     {
>         printf("I'm the spawned.\n");
>     }
>     MPI_Finalize();
>     return 0;
> }
> 
> I called MPI_Intercomm_merge outside of if statement but I got the same error. 
> Spawn is successfull. I have especially tested it. If I try to merge, i got the following error:
> 
> test at ubuntu:~/Desktop/merge$ mpirun -np 1 ./a.out
> I'm the spawned.
> I'm the spawned.
> I'm the spawned.
> I'm the spawned.
> Fatal error in MPI_Intercomm_merge: Other MPI error, error stack:
> MPI_Intercomm_merge(288)..........: MPI_Intercomm_merge(comm=0x84000000, high=1, newintracomm=0xbf97ff40) failed
> MPI_Intercomm_merge(263)..........: 
> MPIR_Get_contextid(639)...........: 
> MPI_Allreduce(773)................: MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0xbf97fd18, count=64, MPI_INT, MPI_BAND, comm=0x84000002) failed
> MPIR_Allreduce(289)...............: 
> MPIC_Sendrecv(161)................: 
> MPIC_Wait(513)....................: 
> MPIDI_CH3I_Progress(150)..........: 
> MPID_nem_mpich2_blocking_recv(948): 
> MPID_nem_tcp_connpoll(1720).......: 
> state_commrdy_handler(1556).......: 
> MPID_nem_tcp_recv_handler(1446)...: socket closed
> rank 0 in job 21  ubuntu_38267   caused collective abort of all ranks
>   exit status of rank 0: killed by signal 9 
> 
> 
> Thanks In Advance, 
> 
> 
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list