[mpich-discuss] New communicator from connect/accept primitives
Rajeev Thakur
thakur at mcs.anl.gov
Wed Jan 20 06:47:14 CST 2010
I my algorithm server A was connected to 2 clients B and C. Since you have one client connected to 2 servers, I suggested you call
the client A and the servers B and C and follow the same algorithm. A is the common point that has connections to both B and C,
hence it is important to follow the algorithm as provided. Also, in one of your files I saw MPI_COMM_NULL as a communicator to
MPI_Intercomm_create. Although I haven't studied the code in detail, I don't think you can pass COMM_NULL. Use COMM_WORLD as in my
algorithm.
Rajeev
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
> Francisco Javier García Blas
> Sent: Wednesday, January 20, 2010 4:40 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] New communicator from
> connect/accept primitives
>
> Hello again,
>
> Rajeev, to clarify the code, I put signatures A,B, and C on each file.
>
> Jayesh, On MPI_Intercomm_create( comm_agg, 0, pool_comm[1],
> 1, 12345 , &comm_aux ) the size of the peer comunicator is
> 1, therefore, passing 1 is incorrect, right?
>
> I got the next error stack on serverC when the last
> MPI_Comm_create is invoked. Rest of processes run fine:
>
> No matching pg foung for id = 1024812961 Fatal error in
> MPI_Intercomm_create: Internal MPI error!, error stack:
> MPI_Intercomm_create(580).: MPI_Intercomm_create(MPI_COMM_SELF,
> local_leader=0, comm=0x84000001, remote_leader=0, tag=12346,
> newintercomm=0xbfb0a790) failed
> MPID_GPID_ToLpidArray(382): Internal MPI error: Unknown gpid
> (1289156231)0[cli_0]: aborting job:
> Fatal error in MPI_Intercomm_create: Internal MPI error!, error stack:
> MPI_Intercomm_create(580).: MPI_Intercomm_create(MPI_COMM_SELF,
> local_leader=0, comm=0x84000001, remote_leader=0, tag=12346,
> newintercomm=0xbfb0a790) failed
> MPID_GPID_ToLpidArray(382): Internal MPI error: Unknown gpid
> (1289156231)0
> rank 0 in job 9 compute-1-0_45339 caused collective abort
> of all ranks
> exit status of rank 0: return code 1
>
> Thanks for all your time.
>
> Best regards
>
> jayesh at mcs.anl.gov escribió:
> > Hi,
> > Rajeev, correct me if I got it wrong...
> > On the client side when creating the intercommunicator you
> should specify the client_B intercommunicator with the
> client_A intracommunicator (MPI_Intercomm_create( comm_agg,
> 0, pool_comm[1], 1, 12345 , &comm_aux ); ).
> > Similarly on the server B side you should specify the client_B
> > intercommunicator with the local communicator in B
> (MPI_Intercomm_create( comm_world, 0, comm_inter, 0, 12345 ,
> &comm_aux ); ).
> > Let us know if it works.
> >
> > Regards,
> > Jayesh
> > ----- Original Message -----
> > From: "Francisco Javier García Blas" <fjblas at arcos.inf.uc3m.es>
> > To: jayesh at mcs.anl.gov
> > Cc: mpich-discuss at mcs.anl.gov
> > Sent: Tuesday, January 19, 2010 10:20:46 AM GMT -06:00 US/Canada
> > Central
> > Subject: Re: [mpich-discuss] New communicator from connect/accept
> > primitives
> >
> > Hi Jayesh,
> >
> >
> > I haven't problem with MPI_Intercomm_merge. I tried to
> merge using different directions successfully. I checked also
> the size of the new intracommunicator after merging and it is
> correct too (size 2).
> >
> >
> > Additionally, yesterday I tried with MPI_Comm_spawn +
> MPI_Intercomm_create examples at testcase without problems.
> In these cases all the processes on the same group have same
> intercommunicators. However, in my case, I am doing something
> wrong when three processes call MPI_Intercomm_create over two
> remote groups. (AB intra, C inter). Arguments mistake maybe?
> >
> >
> > As suggested Dave, I tried my example with the lasted
> stable version of MPICH2, with similar results.
> >
> >
> > Thanks for all
> >
> >
> > Regards
> >
> >
> >
> > El 19/01/2010, a las 16:22, jayesh at mcs.anl.gov escribió:
> >
> >
> >
> > Hi,
> > I haven't looked at your code yet. You can look at the
> testcase, testconnect.c (
> https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/test/mpi/manual
> /testconnect.c ), in the MPICH2 test suite for a simple
> example on how to use connect/accept and intercomm_merge to
> create an intracommunicator.
> >
> > -Jayesh
> >
> > ----- Original Message -----
> > From: "Francisco Javier García Blas" < fjblas at arcos.inf.uc3m.es >
> > To: mpich-discuss at mcs.anl.gov
> > Sent: Monday, January 18, 2010 10:26:08 AM GMT -06:00 US/Canada
> > Central
> > Subject: Re: [mpich-discuss] New communicator from connect/accept
> > primitives
> >
> > Hello again,
> >
> > In first place, thanks for response of Rajeev and Jayesh. Following
> > Rajeev 's instruccion, I implemented an basic example using
> > connect/accept and intercomm_create/merge primitives. I am doing
> > something wrong because when MPI_Intercomm_create is
> invoked, all the
> > processes become blocked. I don't find the error, maybe it
> could be a
> > bad numeration in local and remote communicator but I tried all the
> > combinations.
> >
> > I am using mpich2 1.0.5.
> >
> > I attach the source code and a makefile.
> >
> > Best regards
> >
> > Rajeev Thakur escribió:
> >
> >
> > You will need to use intercomm_merge but you have to merge them one
> >
> >
> > pair at a time. Example below from an old mail.
> >
> >
> >
> >
> >
> > Rajeev
> >
> >
> >
> >
> >
> >
> >
> >
> > If you have 3 intercommunicators AB_inter, AC_inter, and
> AD_inter, you
> >
> >
> > can merge them all into a single
> >
> >
> > intercommunicator as follows:
> >
> >
> >
> >
> >
> > * begin by doing an MPI_Intercomm_merge on AB_inter, resulting in an
> >
> >
> > intracommunicator AB_intra.
> >
> >
> >
> >
> >
> > * then create an intercommunicator between AB on one side
> and C on the
> >
> >
> > other
> >
> >
> > by using MPI_Intercomm_create. Pass AB_intra as the local_comm on A
> > and B,
> >
> >
> > MPI_COMM_WORLD as the intracomm on C, and AC_inter as the
> peer_comm.
> > This
> >
> >
> > results in the intercommunicator AB_C_inter.
> >
> >
> >
> >
> >
> > * then call MPI_Intercomm_merge on it to create the
> intracommunicator
> >
> >
> > ABC_intra.
> >
> >
> >
> >
> >
> > * then call MPI_Intercomm_create to create an intercommunicator
> >
> >
> > between ABC
> >
> >
> > and D just as you did with AB and C above.
> >
> >
> >
> >
> >
> > * Again do an intercomm_merge. This will give you an
> intracommunicator
> >
> >
> > containing A, B, C, D.
> >
> >
> >
> >
> >
> > * If you want an intercommunicator with A in one group and B,C,D in
> > the
> >
> >
> > other, as you would get with a single spawn of 3 processes, you have
> >
> >
> > to call
> >
> >
> > MPI_Comm_split to split this single communicator into two
> >
> >
> > intracommunicators, one containing A and the other
> containing B,C,D.
> > Then
> >
> >
> > call MPI_Intercomm_create to create the intercommunicator.
> >
> >
> >
> >
> >
> >
> ----------------------------------------------------------------------
> > --
> >
> >
> > *From:* mpich-discuss-bounces at mcs.anl.gov
> >
> >
> > [mailto:mpich-discuss-bounces at mcs.anl.gov] *On Behalf Of
> >
> >
> > *Francisco Javier García Blas
> >
> >
> > *Sent:* Friday, January 15, 2010 11:09 AM
> >
> >
> > *To:* mpich-discuss at mcs.anl.gov
> >
> >
> > *Subject:* [mpich-discuss] New communicator from connect/accept
> >
> >
> > primitives
> >
> >
> >
> >
> >
> > Hello all,
> >
> >
> >
> >
> >
> > I wondering the possibility of get a new inter-comunicator from N
> >
> >
> > communicators, which are results from different calls of
> >
> >
> > mpi_comm_connect or mpi_comm_accept.
> >
> >
> >
> >
> >
> > My initial solution was first, to get the group of each
> >
> >
> > inter-communicator with mpi_comm_group, second, to join all the
> >
> >
> > groups into one bigger and finally, to create a new communicator
> >
> >
> > from the group with the mpi_comm_create primitive.
> >
> >
> >
> >
> >
> > Currently I am handling a pool of inter - communicators in order
> >
> >
> > to keep the functionality. However this idea is not suitable for
> >
> >
> > collective and MPI_ANY_SOURCE sends/recvs.
> >
> >
> >
> >
> >
> > Exist another way to join all the inter-communicator into one?
> >
> >
> >
> >
> >
> > Any suggestion?
> >
> >
> >
> >
> >
> > Best regards.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --------------------------------------------------
> >
> >
> > Francisco Javier García Blas
> >
> >
> > Computer Architecture, Communications and Systems Area.
> >
> >
> > Computer Science Department. UNIVERSIDAD CARLOS III DE MADRID
> >
> >
> > Avda. de la Universidad, 30
> >
> >
> > 28911 Leganés (Madrid), SPAIN
> >
> >
> > e-mail: fjblas at arcos.inf.uc3m.es < mailto:fjblas at arcos.inf.uc3m.es >
> >
> >
> > fjblas at inf.uc3m.es < mailto:fjblas at inf.uc3m.es >
> >
> >
> > Phone:(+34) 916249118
> >
> >
> > FAX: (+34) 916249129
> >
> >
> > --------------------------------------------------
> >
> >
> >
> >
> >
> >
> ----------------------------------------------------------------------
> > --
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> >
> > mpich-discuss mailing list
> >
> >
> > mpich-discuss at mcs.anl.gov
> >
> >
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> >
> >
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> >
> >
> >
> > --------------------------------------------------
> > Francisco Javier García Blas
> > Computer Architecture, Communications and Systems Area.
> > Computer Science Department. UNIVERSIDAD CARLOS III DE
> MADRID Avda. de
> > la Universidad, 30
> > 28911 Leganés (Madrid), SPAIN
> > e-mail: fjblas at arcos.inf.uc3m.es
> > fjblas at inf.uc3m.es
> > Phone:(+34) 916249118
> > FAX: (+34) 916249129
> > --------------------------------------------------
> >
>
>
More information about the mpich-discuss
mailing list