[mpich-discuss] New communicator from connect/accept primitives

Rajeev Thakur thakur at mcs.anl.gov
Wed Jan 20 06:47:14 CST 2010


I my algorithm server A was connected to 2 clients B and C. Since you have one client connected to 2 servers, I suggested you call
the client A and the servers B and C and follow the same algorithm. A is the common point that has connections to both B and C,
hence it is important to follow the algorithm as provided. Also, in one of your files I saw MPI_COMM_NULL as a communicator to
MPI_Intercomm_create. Although I haven't studied the code in detail, I don't think you can pass COMM_NULL. Use COMM_WORLD as in my
algorithm.

Rajeev
 

> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
> Francisco Javier García Blas
> Sent: Wednesday, January 20, 2010 4:40 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] New communicator from 
> connect/accept primitives
> 
> Hello again,
> 
> Rajeev, to clarify the code, I put signatures A,B, and C on each file.
> 
> Jayesh, On  MPI_Intercomm_create( comm_agg, 0, pool_comm[1], 
> 1, 12345 , &comm_aux ) the size of the peer comunicator  is 
> 1, therefore, passing 1 is incorrect, right?
> 
> I got the next error stack on serverC when the last  
> MPI_Comm_create is invoked. Rest of processes run fine:
> 
> No matching pg foung for id = 1024812961 Fatal error in 
> MPI_Intercomm_create: Internal MPI error!, error stack:
> MPI_Intercomm_create(580).: MPI_Intercomm_create(MPI_COMM_SELF,
> local_leader=0, comm=0x84000001, remote_leader=0, tag=12346,
> newintercomm=0xbfb0a790) failed
> MPID_GPID_ToLpidArray(382): Internal MPI error: Unknown gpid
> (1289156231)0[cli_0]: aborting job:
> Fatal error in MPI_Intercomm_create: Internal MPI error!, error stack:
> MPI_Intercomm_create(580).: MPI_Intercomm_create(MPI_COMM_SELF,
> local_leader=0, comm=0x84000001, remote_leader=0, tag=12346,
> newintercomm=0xbfb0a790) failed
> MPID_GPID_ToLpidArray(382): Internal MPI error: Unknown gpid 
> (1289156231)0
> rank 0 in job 9  compute-1-0_45339   caused collective abort 
> of all ranks
>   exit status of rank 0: return code 1
> 
> Thanks for all your time.
> 
> Best regards
> 
> jayesh at mcs.anl.gov escribió:
> > Hi,
> >  Rajeev, correct me if I got it wrong...
> >  On the client side when creating the intercommunicator you 
> should specify the client_B intercommunicator with the 
> client_A intracommunicator (MPI_Intercomm_create( comm_agg, 
> 0,  pool_comm[1], 1, 12345  , &comm_aux ); ).
> >  Similarly on the server B side you should specify the client_B 
> > intercommunicator with the local communicator in B 
> (MPI_Intercomm_create( comm_world, 0, comm_inter, 0, 12345 , 
> &comm_aux ); ).
> >  Let us know if it works.
> >
> > Regards,
> > Jayesh
> > ----- Original Message -----
> > From: "Francisco Javier García Blas" <fjblas at arcos.inf.uc3m.es>
> > To: jayesh at mcs.anl.gov
> > Cc: mpich-discuss at mcs.anl.gov
> > Sent: Tuesday, January 19, 2010 10:20:46 AM GMT -06:00 US/Canada 
> > Central
> > Subject: Re: [mpich-discuss] New communicator from connect/accept 
> > primitives
> >
> > Hi Jayesh,
> >
> >
> > I haven't problem with MPI_Intercomm_merge. I tried to 
> merge using different directions successfully. I checked also 
> the size of the new intracommunicator after merging and it is 
> correct too (size 2). 
> >
> >
> > Additionally, yesterday I tried with MPI_Comm_spawn + 
> MPI_Intercomm_create examples at testcase without problems. 
> In these cases all the processes on the same group have same 
> intercommunicators. However, in my case, I am doing something 
> wrong when three processes call MPI_Intercomm_create over two 
> remote groups. (AB intra, C inter). Arguments mistake maybe? 
> >
> >
> > As suggested Dave, I tried my example with the lasted 
> stable version of MPICH2, with similar results. 
> >
> >
> > Thanks for all
> >
> >
> > Regards
> >
> >
> >
> > El 19/01/2010, a las 16:22, jayesh at mcs.anl.gov escribió: 
> >
> >
> >
> > Hi,
> > I haven't looked at your code yet. You can look at the 
> testcase, testconnect.c ( 
> https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/test/mpi/manual
> /testconnect.c ), in the MPICH2 test suite for a simple 
> example on how to use connect/accept and intercomm_merge to 
> create an intracommunicator. 
> >
> > -Jayesh
> >
> > ----- Original Message -----
> > From: "Francisco Javier García Blas" < fjblas at arcos.inf.uc3m.es >
> > To: mpich-discuss at mcs.anl.gov
> > Sent: Monday, January 18, 2010 10:26:08 AM GMT -06:00 US/Canada 
> > Central
> > Subject: Re: [mpich-discuss] New communicator from connect/accept 
> > primitives
> >
> > Hello again,
> >
> > In first place, thanks for response of Rajeev and Jayesh. Following 
> > Rajeev 's instruccion, I implemented an basic example using 
> > connect/accept and intercomm_create/merge primitives. I am doing 
> > something wrong because when MPI_Intercomm_create is 
> invoked, all the 
> > processes become blocked. I don't find the error, maybe it 
> could be a 
> > bad numeration in local and remote communicator but I tried all the 
> > combinations.
> >
> > I am using mpich2 1.0.5. 
> >
> > I attach the source code and a makefile. 
> >
> > Best regards
> >
> > Rajeev Thakur escribió: 
> >
> >
> > You will need to use intercomm_merge but you have to merge them one
> >
> >
> > pair at a time. Example below from an old mail. 
> >
> >
> >
> >
> >
> > Rajeev
> >
> >
> >
> >
> >
> >
> >
> >
> > If you have 3 intercommunicators AB_inter, AC_inter, and 
> AD_inter, you
> >
> >
> > can merge them all into a single
> >
> >
> > intercommunicator as follows: 
> >
> >
> >
> >
> >
> > * begin by doing an MPI_Intercomm_merge on AB_inter, resulting in an
> >
> >
> > intracommunicator AB_intra. 
> >
> >
> >
> >
> >
> > * then create an intercommunicator between AB on one side 
> and C on the
> >
> >
> > other
> >
> >
> > by using MPI_Intercomm_create. Pass AB_intra as the local_comm on A 
> > and B,
> >
> >
> > MPI_COMM_WORLD as the intracomm on C, and AC_inter as the 
> peer_comm. 
> > This
> >
> >
> > results in the intercommunicator AB_C_inter. 
> >
> >
> >
> >
> >
> > * then call MPI_Intercomm_merge on it to create the 
> intracommunicator
> >
> >
> > ABC_intra. 
> >
> >
> >
> >
> >
> > * then call MPI_Intercomm_create to create an intercommunicator
> >
> >
> > between ABC
> >
> >
> > and D just as you did with AB and C above. 
> >
> >
> >
> >
> >
> > * Again do an intercomm_merge. This will give you an 
> intracommunicator
> >
> >
> > containing A, B, C, D. 
> >
> >
> >
> >
> >
> > * If you want an intercommunicator with A in one group and B,C,D in 
> > the
> >
> >
> > other, as you would get with a single spawn of 3 processes, you have
> >
> >
> > to call
> >
> >
> > MPI_Comm_split to split this single communicator into two
> >
> >
> > intracommunicators, one containing A and the other 
> containing B,C,D. 
> > Then
> >
> >
> > call MPI_Intercomm_create to create the intercommunicator. 
> >
> >
> >
> >
> >
> > 
> ----------------------------------------------------------------------
> > --
> >
> >
> > *From:* mpich-discuss-bounces at mcs.anl.gov
> >
> >
> > [mailto:mpich-discuss-bounces at mcs.anl.gov] *On Behalf Of
> >
> >
> > *Francisco Javier García Blas
> >
> >
> > *Sent:* Friday, January 15, 2010 11:09 AM
> >
> >
> > *To:* mpich-discuss at mcs.anl.gov
> >
> >
> > *Subject:* [mpich-discuss] New communicator from connect/accept
> >
> >
> > primitives
> >
> >
> >
> >
> >
> > Hello all,
> >
> >
> >
> >
> >
> > I wondering the possibility of get a new inter-comunicator from N
> >
> >
> > communicators, which are results from different calls of
> >
> >
> > mpi_comm_connect or mpi_comm_accept. 
> >
> >
> >
> >
> >
> > My initial solution was first, to get the group of each
> >
> >
> > inter-communicator with mpi_comm_group, second, to join all the
> >
> >
> > groups into one bigger and finally, to create a new communicator
> >
> >
> > from the group with the mpi_comm_create primitive. 
> >
> >
> >
> >
> >
> > Currently I am handling a pool of inter - communicators in order
> >
> >
> > to keep the functionality. However this idea is not suitable for
> >
> >
> > collective and MPI_ANY_SOURCE sends/recvs. 
> >
> >
> >
> >
> >
> > Exist another way to join all the inter-communicator into one? 
> >
> >
> >
> >
> >
> > Any suggestion? 
> >
> >
> >
> >
> >
> > Best regards. 
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --------------------------------------------------
> >
> >
> > Francisco Javier García Blas
> >
> >
> > Computer Architecture, Communications and Systems Area. 
> >
> >
> > Computer Science Department. UNIVERSIDAD CARLOS III DE MADRID
> >
> >
> > Avda. de la Universidad, 30
> >
> >
> > 28911 Leganés (Madrid), SPAIN
> >
> >
> > e-mail: fjblas at arcos.inf.uc3m.es < mailto:fjblas at arcos.inf.uc3m.es >
> >
> >
> > fjblas at inf.uc3m.es < mailto:fjblas at inf.uc3m.es >
> >
> >
> > Phone:(+34) 916249118
> >
> >
> > FAX: (+34) 916249129
> >
> >
> > --------------------------------------------------
> >
> >
> >
> >
> >
> > 
> ----------------------------------------------------------------------
> > --
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> >
> > mpich-discuss mailing list
> >
> >
> > mpich-discuss at mcs.anl.gov
> >
> >
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> >
> >
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> >
> >
> >
> > --------------------------------------------------
> > Francisco Javier García Blas
> > Computer Architecture, Communications and Systems Area. 
> > Computer Science Department. UNIVERSIDAD CARLOS III DE 
> MADRID Avda. de 
> > la Universidad, 30
> > 28911 Leganés (Madrid), SPAIN
> > e-mail: fjblas at arcos.inf.uc3m.es
> > fjblas at inf.uc3m.es
> > Phone:(+34) 916249118
> > FAX: (+34) 916249129
> > --------------------------------------------------
> >   
> 
> 



More information about the mpich-discuss mailing list