[mpich-discuss] New communicator from connect/accept primitives
Francisco Javier García Blas
fjblas at arcos.inf.uc3m.es
Thu Jan 21 05:25:11 CST 2010
Hi Jayesh,
The trick consisted of connect both servers with another intrecommunicator, I thought that it was not necessary using MPI_Intercomm_create.
This example represents a good scenario of using intercommunicators with isolated servants. It could be interesting to include it into mpich2. examples :D.
Thanks again Rajeev and Jayesh for your helpful tips and time.
Best regards
El 20/01/2010, a las 18:34, Jayesh Krishna escribió:
> Hi,
> I have modified your code (You can download the same at http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/temp/test_inter_modif.tar.gz) and it works for me (The code is now closer to Rajeev's suggestions).
> Let me know if it works for you.
>
> Regards,
> Jayesh
> ----- Original Message -----
> From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> To: mpich-discuss at mcs.anl.gov
> Sent: Wednesday, January 20, 2010 6:47:14 AM GMT -06:00 US/Canada Central
> Subject: Re: [mpich-discuss] New communicator from connect/accept primitives
>
> I my algorithm server A was connected to 2 clients B and C. Since you have one client connected to 2 servers, I suggested you call
> the client A and the servers B and C and follow the same algorithm. A is the common point that has connections to both B and C,
> hence it is important to follow the algorithm as provided. Also, in one of your files I saw MPI_COMM_NULL as a communicator to
> MPI_Intercomm_create. Although I haven't studied the code in detail, I don't think you can pass COMM_NULL. Use COMM_WORLD as in my
> algorithm.
>
> Rajeev
>
>
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of
>> Francisco Javier García Blas
>> Sent: Wednesday, January 20, 2010 4:40 AM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] New communicator from
>> connect/accept primitives
>>
>> Hello again,
>>
>> Rajeev, to clarify the code, I put signatures A,B, and C on each file.
>>
>> Jayesh, On MPI_Intercomm_create( comm_agg, 0, pool_comm[1],
>> 1, 12345 , &comm_aux ) the size of the peer comunicator is
>> 1, therefore, passing 1 is incorrect, right?
>>
>> I got the next error stack on serverC when the last
>> MPI_Comm_create is invoked. Rest of processes run fine:
>>
>> No matching pg foung for id = 1024812961 Fatal error in
>> MPI_Intercomm_create: Internal MPI error!, error stack:
>> MPI_Intercomm_create(580).: MPI_Intercomm_create(MPI_COMM_SELF,
>> local_leader=0, comm=0x84000001, remote_leader=0, tag=12346,
>> newintercomm=0xbfb0a790) failed
>> MPID_GPID_ToLpidArray(382): Internal MPI error: Unknown gpid
>> (1289156231)0[cli_0]: aborting job:
>> Fatal error in MPI_Intercomm_create: Internal MPI error!, error stack:
>> MPI_Intercomm_create(580).: MPI_Intercomm_create(MPI_COMM_SELF,
>> local_leader=0, comm=0x84000001, remote_leader=0, tag=12346,
>> newintercomm=0xbfb0a790) failed
>> MPID_GPID_ToLpidArray(382): Internal MPI error: Unknown gpid
>> (1289156231)0
>> rank 0 in job 9 compute-1-0_45339 caused collective abort
>> of all ranks
>> exit status of rank 0: return code 1
>>
>> Thanks for all your time.
>>
>> Best regards
>>
>> jayesh at mcs.anl.gov escribió:
>>> Hi,
>>> Rajeev, correct me if I got it wrong...
>>> On the client side when creating the intercommunicator you
>> should specify the client_B intercommunicator with the
>> client_A intracommunicator (MPI_Intercomm_create( comm_agg,
>> 0, pool_comm[1], 1, 12345 , &comm_aux ); ).
>>> Similarly on the server B side you should specify the client_B
>>> intercommunicator with the local communicator in B
>> (MPI_Intercomm_create( comm_world, 0, comm_inter, 0, 12345 ,
>> &comm_aux ); ).
>>> Let us know if it works.
>>>
>>> Regards,
>>> Jayesh
>>> ----- Original Message -----
>>> From: "Francisco Javier García Blas" <fjblas at arcos.inf.uc3m.es>
>>> To: jayesh at mcs.anl.gov
>>> Cc: mpich-discuss at mcs.anl.gov
>>> Sent: Tuesday, January 19, 2010 10:20:46 AM GMT -06:00 US/Canada
>>> Central
>>> Subject: Re: [mpich-discuss] New communicator from connect/accept
>>> primitives
>>>
>>> Hi Jayesh,
>>>
>>>
>>> I haven't problem with MPI_Intercomm_merge. I tried to
>> merge using different directions successfully. I checked also
>> the size of the new intracommunicator after merging and it is
>> correct too (size 2).
>>>
>>>
>>> Additionally, yesterday I tried with MPI_Comm_spawn +
>> MPI_Intercomm_create examples at testcase without problems.
>> In these cases all the processes on the same group have same
>> intercommunicators. However, in my case, I am doing something
>> wrong when three processes call MPI_Intercomm_create over two
>> remote groups. (AB intra, C inter). Arguments mistake maybe?
>>>
>>>
>>> As suggested Dave, I tried my example with the lasted
>> stable version of MPICH2, with similar results.
>>>
>>>
>>> Thanks for all
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> El 19/01/2010, a las 16:22, jayesh at mcs.anl.gov escribió:
>>>
>>>
>>>
>>> Hi,
>>> I haven't looked at your code yet. You can look at the
>> testcase, testconnect.c (
>> https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/test/mpi/manual
>> /testconnect.c ), in the MPICH2 test suite for a simple
>> example on how to use connect/accept and intercomm_merge to
>> create an intracommunicator.
>>>
>>> -Jayesh
>>>
>>> ----- Original Message -----
>>> From: "Francisco Javier García Blas" < fjblas at arcos.inf.uc3m.es >
>>> To: mpich-discuss at mcs.anl.gov
>>> Sent: Monday, January 18, 2010 10:26:08 AM GMT -06:00 US/Canada
>>> Central
>>> Subject: Re: [mpich-discuss] New communicator from connect/accept
>>> primitives
>>>
>>> Hello again,
>>>
>>> In first place, thanks for response of Rajeev and Jayesh. Following
>>> Rajeev 's instruccion, I implemented an basic example using
>>> connect/accept and intercomm_create/merge primitives. I am doing
>>> something wrong because when MPI_Intercomm_create is
>> invoked, all the
>>> processes become blocked. I don't find the error, maybe it
>> could be a
>>> bad numeration in local and remote communicator but I tried all the
>>> combinations.
>>>
>>> I am using mpich2 1.0.5.
>>>
>>> I attach the source code and a makefile.
>>>
>>> Best regards
>>>
>>> Rajeev Thakur escribió:
>>>
>>>
>>> You will need to use intercomm_merge but you have to merge them one
>>>
>>>
>>> pair at a time. Example below from an old mail.
>>>
>>>
>>>
>>>
>>>
>>> Rajeev
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> If you have 3 intercommunicators AB_inter, AC_inter, and
>> AD_inter, you
>>>
>>>
>>> can merge them all into a single
>>>
>>>
>>> intercommunicator as follows:
>>>
>>>
>>>
>>>
>>>
>>> * begin by doing an MPI_Intercomm_merge on AB_inter, resulting in an
>>>
>>>
>>> intracommunicator AB_intra.
>>>
>>>
>>>
>>>
>>>
>>> * then create an intercommunicator between AB on one side
>> and C on the
>>>
>>>
>>> other
>>>
>>>
>>> by using MPI_Intercomm_create. Pass AB_intra as the local_comm on A
>>> and B,
>>>
>>>
>>> MPI_COMM_WORLD as the intracomm on C, and AC_inter as the
>> peer_comm.
>>> This
>>>
>>>
>>> results in the intercommunicator AB_C_inter.
>>>
>>>
>>>
>>>
>>>
>>> * then call MPI_Intercomm_merge on it to create the
>> intracommunicator
>>>
>>>
>>> ABC_intra.
>>>
>>>
>>>
>>>
>>>
>>> * then call MPI_Intercomm_create to create an intercommunicator
>>>
>>>
>>> between ABC
>>>
>>>
>>> and D just as you did with AB and C above.
>>>
>>>
>>>
>>>
>>>
>>> * Again do an intercomm_merge. This will give you an
>> intracommunicator
>>>
>>>
>>> containing A, B, C, D.
>>>
>>>
>>>
>>>
>>>
>>> * If you want an intercommunicator with A in one group and B,C,D in
>>> the
>>>
>>>
>>> other, as you would get with a single spawn of 3 processes, you have
>>>
>>>
>>> to call
>>>
>>>
>>> MPI_Comm_split to split this single communicator into two
>>>
>>>
>>> intracommunicators, one containing A and the other
>> containing B,C,D.
>>> Then
>>>
>>>
>>> call MPI_Intercomm_create to create the intercommunicator.
>>>
>>>
>>>
>>>
>>>
>>>
>> ----------------------------------------------------------------------
>>> --
>>>
>>>
>>> *From:* mpich-discuss-bounces at mcs.anl.gov
>>>
>>>
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] *On Behalf Of
>>>
>>>
>>> *Francisco Javier García Blas
>>>
>>>
>>> *Sent:* Friday, January 15, 2010 11:09 AM
>>>
>>>
>>> *To:* mpich-discuss at mcs.anl.gov
>>>
>>>
>>> *Subject:* [mpich-discuss] New communicator from connect/accept
>>>
>>>
>>> primitives
>>>
>>>
>>>
>>>
>>>
>>> Hello all,
>>>
>>>
>>>
>>>
>>>
>>> I wondering the possibility of get a new inter-comunicator from N
>>>
>>>
>>> communicators, which are results from different calls of
>>>
>>>
>>> mpi_comm_connect or mpi_comm_accept.
>>>
>>>
>>>
>>>
>>>
>>> My initial solution was first, to get the group of each
>>>
>>>
>>> inter-communicator with mpi_comm_group, second, to join all the
>>>
>>>
>>> groups into one bigger and finally, to create a new communicator
>>>
>>>
>>> from the group with the mpi_comm_create primitive.
>>>
>>>
>>>
>>>
>>>
>>> Currently I am handling a pool of inter - communicators in order
>>>
>>>
>>> to keep the functionality. However this idea is not suitable for
>>>
>>>
>>> collective and MPI_ANY_SOURCE sends/recvs.
>>>
>>>
>>>
>>>
>>>
>>> Exist another way to join all the inter-communicator into one?
>>>
>>>
>>>
>>>
>>>
>>> Any suggestion?
>>>
>>>
>>>
>>>
>>>
>>> Best regards.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --------------------------------------------------
>>>
>>>
>>> Francisco Javier García Blas
>>>
>>>
>>> Computer Architecture, Communications and Systems Area.
>>>
>>>
>>> Computer Science Department. UNIVERSIDAD CARLOS III DE MADRID
>>>
>>>
>>> Avda. de la Universidad, 30
>>>
>>>
>>> 28911 Leganés (Madrid), SPAIN
>>>
>>>
>>> e-mail: fjblas at arcos.inf.uc3m.es < mailto:fjblas at arcos.inf.uc3m.es >
>>>
>>>
>>> fjblas at inf.uc3m.es < mailto:fjblas at inf.uc3m.es >
>>>
>>>
>>> Phone:(+34) 916249118
>>>
>>>
>>> FAX: (+34) 916249129
>>>
>>>
>>> --------------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>> ----------------------------------------------------------------------
>>> --
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>>
>>> mpich-discuss mailing list
>>>
>>>
>>> mpich-discuss at mcs.anl.gov
>>>
>>>
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>>
>>>
>>>
>>>
>>> --------------------------------------------------
>>> Francisco Javier García Blas
>>> Computer Architecture, Communications and Systems Area.
>>> Computer Science Department. UNIVERSIDAD CARLOS III DE
>> MADRID Avda. de
>>> la Universidad, 30
>>> 28911 Leganés (Madrid), SPAIN
>>> e-mail: fjblas at arcos.inf.uc3m.es
>>> fjblas at inf.uc3m.es
>>> Phone:(+34) 916249118
>>> FAX: (+34) 916249129
>>> --------------------------------------------------
>>>
>>
>>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--------------------------------------------------
Francisco Javier García Blas
Computer Architecture, Communications and Systems Area.
Computer Science Department. UNIVERSIDAD CARLOS III DE MADRID
Avda. de la Universidad, 30
28911 Leganés (Madrid), SPAIN
e-mail: fjblas at arcos.inf.uc3m.es
fjblas at inf.uc3m.es
Phone:(+34) 916249118
FAX: (+34) 916249129
--------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100121/4d908465/attachment-0001.htm>
More information about the mpich-discuss
mailing list