[mpich-discuss] New communicator from connect/accept primitives

Francisco Javier García Blas fjblas at arcos.inf.uc3m.es
Wed Jan 20 04:39:37 CST 2010


Hello again,

Rajeev, to clarify the code, I put signatures A,B, and C on each file.

Jayesh, On  MPI_Intercomm_create( comm_agg, 0, pool_comm[1], 1, 12345 , 
&comm_aux ) the size of the peer comunicator  is 1, therefore, passing 1 
is incorrect, right?

I got the next error stack on serverC when the last  MPI_Comm_create is 
invoked. Rest of processes run fine:

No matching pg foung for id = 1024812961
Fatal error in MPI_Intercomm_create: Internal MPI error!, error stack:
MPI_Intercomm_create(580).: MPI_Intercomm_create(MPI_COMM_SELF, 
local_leader=0, comm=0x84000001, remote_leader=0, tag=12346, 
newintercomm=0xbfb0a790) failed
MPID_GPID_ToLpidArray(382): Internal MPI error: Unknown gpid 
(1289156231)0[cli_0]: aborting job:
Fatal error in MPI_Intercomm_create: Internal MPI error!, error stack:
MPI_Intercomm_create(580).: MPI_Intercomm_create(MPI_COMM_SELF, 
local_leader=0, comm=0x84000001, remote_leader=0, tag=12346, 
newintercomm=0xbfb0a790) failed
MPID_GPID_ToLpidArray(382): Internal MPI error: Unknown gpid (1289156231)0
rank 0 in job 9  compute-1-0_45339   caused collective abort of all ranks
  exit status of rank 0: return code 1

Thanks for all your time.

Best regards

jayesh at mcs.anl.gov escribió:
> Hi,
>  Rajeev, correct me if I got it wrong...
>  On the client side when creating the intercommunicator you should specify the client_B intercommunicator with the client_A intracommunicator (MPI_Intercomm_create( comm_agg, 0,  pool_comm[1], 1, 12345  , &comm_aux ); ).
>  Similarly on the server B side you should specify the client_B intercommunicator with the local communicator in B (MPI_Intercomm_create( comm_world, 0, comm_inter, 0, 12345 , &comm_aux ); 
> ).
>  Let us know if it works.
>
> Regards,
> Jayesh
> ----- Original Message -----
> From: "Francisco Javier García Blas" <fjblas at arcos.inf.uc3m.es>
> To: jayesh at mcs.anl.gov
> Cc: mpich-discuss at mcs.anl.gov
> Sent: Tuesday, January 19, 2010 10:20:46 AM GMT -06:00 US/Canada Central
> Subject: Re: [mpich-discuss] New communicator from connect/accept primitives
>
> Hi Jayesh, 
>
>
> I haven't problem with MPI_Intercomm_merge. I tried to merge using different directions successfully. I checked also the size of the new intracommunicator after merging and it is correct too (size 2). 
>
>
> Additionally, yesterday I tried with MPI_Comm_spawn + MPI_Intercomm_create examples at testcase without problems. In these cases all the processes on the same group have same intercommunicators. However, in my case, I am doing something wrong when three processes call MPI_Intercomm_create over two remote groups. (AB intra, C inter). Arguments mistake maybe? 
>
>
> As suggested Dave, I tried my example with the lasted stable version of MPICH2, with similar results. 
>
>
> Thanks for all 
>
>
> Regards 
>
>
>
> El 19/01/2010, a las 16:22, jayesh at mcs.anl.gov escribió: 
>
>
>
> Hi, 
> I haven't looked at your code yet. You can look at the testcase, testconnect.c ( https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/test/mpi/manual/testconnect.c ), in the MPICH2 test suite for a simple example on how to use connect/accept and intercomm_merge to create an intracommunicator. 
>
> -Jayesh 
>
> ----- Original Message ----- 
> From: "Francisco Javier García Blas" < fjblas at arcos.inf.uc3m.es > 
> To: mpich-discuss at mcs.anl.gov 
> Sent: Monday, January 18, 2010 10:26:08 AM GMT -06:00 US/Canada Central 
> Subject: Re: [mpich-discuss] New communicator from connect/accept primitives 
>
> Hello again, 
>
> In first place, thanks for response of Rajeev and Jayesh. Following 
> Rajeev 's instruccion, I implemented an basic example using 
> connect/accept and intercomm_create/merge primitives. I am doing 
> something wrong because when MPI_Intercomm_create is invoked, all the 
> processes become blocked. I don't find the error, maybe it could be a 
> bad numeration in local and remote communicator but I tried all the 
> combinations. 
>
> I am using mpich2 1.0.5. 
>
> I attach the source code and a makefile. 
>
> Best regards 
>
> Rajeev Thakur escribió: 
>
>
> You will need to use intercomm_merge but you have to merge them one 
>
>
> pair at a time. Example below from an old mail. 
>
>
>
>
>
> Rajeev 
>
>
>
>
>
>
>
>
> If you have 3 intercommunicators AB_inter, AC_inter, and AD_inter, you 
>
>
> can merge them all into a single 
>
>
> intercommunicator as follows: 
>
>
>
>
>
> * begin by doing an MPI_Intercomm_merge on AB_inter, resulting in an 
>
>
> intracommunicator AB_intra. 
>
>
>
>
>
> * then create an intercommunicator between AB on one side and C on the 
>
>
> other 
>
>
> by using MPI_Intercomm_create. Pass AB_intra as the local_comm on A and B, 
>
>
> MPI_COMM_WORLD as the intracomm on C, and AC_inter as the peer_comm. This 
>
>
> results in the intercommunicator AB_C_inter. 
>
>
>
>
>
> * then call MPI_Intercomm_merge on it to create the intracommunicator 
>
>
> ABC_intra. 
>
>
>
>
>
> * then call MPI_Intercomm_create to create an intercommunicator 
>
>
> between ABC 
>
>
> and D just as you did with AB and C above. 
>
>
>
>
>
> * Again do an intercomm_merge. This will give you an intracommunicator 
>
>
> containing A, B, C, D. 
>
>
>
>
>
> * If you want an intercommunicator with A in one group and B,C,D in the 
>
>
> other, as you would get with a single spawn of 3 processes, you have 
>
>
> to call 
>
>
> MPI_Comm_split to split this single communicator into two 
>
>
> intracommunicators, one containing A and the other containing B,C,D. Then 
>
>
> call MPI_Intercomm_create to create the intercommunicator. 
>
>
>
>
>
> ------------------------------------------------------------------------ 
>
>
> *From:* mpich-discuss-bounces at mcs.anl.gov 
>
>
> [mailto:mpich-discuss-bounces at mcs.anl.gov] *On Behalf Of 
>
>
> *Francisco Javier García Blas 
>
>
> *Sent:* Friday, January 15, 2010 11:09 AM 
>
>
> *To:* mpich-discuss at mcs.anl.gov 
>
>
> *Subject:* [mpich-discuss] New communicator from connect/accept 
>
>
> primitives 
>
>
>
>
>
> Hello all, 
>
>
>
>
>
> I wondering the possibility of get a new inter-comunicator from N 
>
>
> communicators, which are results from different calls of 
>
>
> mpi_comm_connect or mpi_comm_accept. 
>
>
>
>
>
> My initial solution was first, to get the group of each 
>
>
> inter-communicator with mpi_comm_group, second, to join all the 
>
>
> groups into one bigger and finally, to create a new communicator 
>
>
> from the group with the mpi_comm_create primitive. 
>
>
>
>
>
> Currently I am handling a pool of inter - communicators in order 
>
>
> to keep the functionality. However this idea is not suitable for 
>
>
> collective and MPI_ANY_SOURCE sends/recvs. 
>
>
>
>
>
> Exist another way to join all the inter-communicator into one? 
>
>
>
>
>
> Any suggestion? 
>
>
>
>
>
> Best regards. 
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -------------------------------------------------- 
>
>
> Francisco Javier García Blas 
>
>
> Computer Architecture, Communications and Systems Area. 
>
>
> Computer Science Department. UNIVERSIDAD CARLOS III DE MADRID 
>
>
> Avda. de la Universidad, 30 
>
>
> 28911 Leganés (Madrid), SPAIN 
>
>
> e-mail: fjblas at arcos.inf.uc3m.es < mailto:fjblas at arcos.inf.uc3m.es > 
>
>
> fjblas at inf.uc3m.es < mailto:fjblas at inf.uc3m.es > 
>
>
> Phone:(+34) 916249118 
>
>
> FAX: (+34) 916249129 
>
>
> -------------------------------------------------- 
>
>
>
>
>
> ------------------------------------------------------------------------ 
>
>
>
>
>
> _______________________________________________ 
>
>
> mpich-discuss mailing list 
>
>
> mpich-discuss at mcs.anl.gov 
>
>
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 
>
>
>
>
>
> _______________________________________________ 
> mpich-discuss mailing list 
> mpich-discuss at mcs.anl.gov 
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 
>
>
>
>
>
> -------------------------------------------------- 
> Francisco Javier García Blas 
> Computer Architecture, Communications and Systems Area. 
> Computer Science Department. UNIVERSIDAD CARLOS III DE MADRID 
> Avda. de la Universidad, 30 
> 28911 Leganés (Madrid), SPAIN 
> e-mail: fjblas at arcos.inf.uc3m.es 
> fjblas at inf.uc3m.es 
> Phone:(+34) 916249118 
> FAX: (+34) 916249129 
> -------------------------------------------------- 
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_inter.tgz
Type: application/octet-stream
Size: 1410 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100120/05c0bb88/attachment.obj>


More information about the mpich-discuss mailing list