[mpich-discuss] Processor hangs on MPI_Comm_accept

Alberto Giannetti albertogiannetti at gmail.com
Wed May 7 20:32:20 CDT 2008


Understood. But why the accept does not fail?

On May 7, 2008, at 7:26 PM, Rajeev Thakur wrote:

> That's an invalid program. There is no rank 1 in MPI_COMM_SELF. See  
> the
> definition of MPI_Comm_accept:
>
> IN root	rank in comm of root node
> IN comm	intracommunicator over which call is collective
>
>
>> -----Original Message-----
>> From: owner-mpich-discuss at mcs.anl.gov
>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
>> Giannetti
>> Sent: Wednesday, May 07, 2008 6:11 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
>>
>> I run the server program with this accept call:
>>
>>    if( MPI_Comm_accept(myport, MPI_INFO_NULL, 1, MPI_COMM_SELF,
>> &intercomm) != MPI_SUCCESS ) {
>>      printf("Processor %d, Error on MPI_Comm_accept\n", rank);
>>      MPI_Finalize();
>>      exit(-1);
>>    }
>>
>> Run mpiexec with 1 processor:
>>
>> $ mpiexec -np 1 ./server
>> Processor 0 (2670, Receiver) initialized
>> Processor 0 opened port tag#0$port#56897$description#192.168.0.10
>> $ifname#192.168.0.10$
>> Publishing port tag#0$port#56897$description#192.168.0.10
>> $ifname#192.168.0.10$ as service myfriend-0
>> Processor 0 waiting for connections on tag#0$port#56897
>> $description#192.168.0.10$ifname#192.168.0.10$, service myfriend-0...
>> Processor 0 new connection on port tag#0$port#56897
>> $description#192.168.0.10$ifname#192.168.0.10$
>> Processor 0 closing port tag#0$port#56897$description#192.168.0.10
>> $ifname#192.168.0.10$
>> Processor 0 unpublishing service myfriend-0
>> Processor 0 remote comm size is 0
>> Processor 0 waiting for data from new intercomm...
>>
>> No client program is running. How does the server receives
>> and accept
>> an incoming connection?
>>
>>
>> On May 7, 2008, at 6:30 PM, Rajeev Thakur wrote:
>>
>>> The rank is always relative to the communicator. If the
>>> communicator is
>>> comm_self, the root rank cannot be other than 0 because comm_self
>>> has no
>>> rank 1.
>>>
>>> See
>> http://www.mpi-forum.org/docs/mpi-20-html/node103.htm#Node103 for
>>> information about the routines.
>>>
>>>> -----Original Message-----
>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
>>>> Giannetti
>>>> Sent: Wednesday, May 07, 2008 5:20 PM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
>>>>
>>>> I changed the Accept root parameter from 0 to the rank of the
>>>> processor ('rank' variable), group MPI_COMM_SELF. Right after I run
>>>> the server, processor 1 receives an incoming connection,
>>>> although the
>>>> client is not running. Why is a connection established.
>>>>
>>>> I am confused about the root parameters in Connect/Accept. Would
>>>> appreciate some more detailed information.
>>>>
>>>> On May 7, 2008, at 5:07 PM, Rajeev Thakur wrote:
>>>>
>>>>> MPI_Comm_connect with comm_world succeeds on both clients because
>>>>> it is
>>>>> matched by the MPI_Comm_accept with comm_self on rank 0 of the
>>>>> servers.
>>>>> There is nothing to match the Accept with comm_self on the server
>>>>> with rank
>>>>> 1, so it hangs. If you used comm_world on the servers as well,
>>>>> everything
>>>>> would work again.
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
>>>>>> Giannetti
>>>>>> Sent: Wednesday, May 07, 2008 3:34 PM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
>>>>>>
>>>>>>
>>>>>> On May 7, 2008, at 3:26 PM, Rajeev Thakur wrote:
>>>>>>
>>>>>>> If you use MPI_COMM_SELF on the server side and
>> MPI_COMM_WORLD on
>>>>>>> the client
>>>>>>> side, it won't work. Using MPI_COMM_WORLD in
>>>>>> MPI_Comm_connect makes
>>>>>>> it one
>>>>>>> collective connect. The port name passed on the non-root rank is
>>>>>>> ignored.
>>>>>>
>>>>>> The MPI_Comm_connect succeeds for both processors.
>>>>>> Why a client collective connect would not work?
>>>>>>
>>>>>>> If
>>>>>>> you use MPI_COMM_SELF on both sides, it becomes two separate
>>>>>>> connects that
>>>>>>> match the two separate accepts.
>>>>>>>
>>>>>>> Rajeev
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
>>>>>>>> Giannetti
>>>>>>>> Sent: Wednesday, May 07, 2008 1:41 PM
>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>> Subject: [mpich-discuss] Processor hangs on MPI_Comm_accept
>>>>>>>>
>>>>>>>> Resending with regular file attachments.
>>>>>>>>
>>>>>>>> I have two simple programs that connect through a MPI_Connect/
>>>>>>>> MPI_Accept scheme.
>>>>>>>> client.c look for a service published by server.c, connects and
>>>>>>>> create an intercomm to send a message, then waits for a
>>>>>> message from
>>>>>>>> the server and finally disconnects from the intercomm.
>> I run both
>>>>>>>> client and server with 2 processors:
>>>>>>>>
>>>>>>>> mpiexec -n 2 server
>>>>>>>> mpiexec -n 2 client myfriend
>>>>>>>>
>>>>>>>> Everything works if I call both MPI_Comm_connect and
>>>>>> MPI_Comm_accept
>>>>>>>> using the MPI_COMM_SELF group. However, if I use
>>>> MPI_COMM_WORLD on
>>>>>>>> the MPI_Connect() client call, one of the two server processors
>>>>>>>> hangs. My trace shows that the client connects to the server,
>>>>>>>> but the
>>>>>>>> server never leaves MPI_Comm_accept():
>>>>>>>>
>>>>>>>> Client trace:
>>>>>>>> Processor 0 (1463, Sender) initialized
>>>>>>>> Processor 0 looking for service myfriend-0
>>>>>>>> Processor 1 (1462, Sender) initialized
>>>>>>>> Processor 1 looking for service myfriend-1
>>>>>>>> Processor 0 found port
>> tag#0$port#53996$description#192.168.0.10
>>>>>>>> $ifname#192.168.0.10$ looking for service myfriend-0
>>>>>>>> Processor 0 connecting to
>>>>>> 'tag#0$port#53996$description#192.168.0.10
>>>>>>>> $ifname#192.168.0.10$'
>>>>>>>> Processor 1 found port
>> tag#0$port#53995$description#192.168.0.10
>>>>>>>> $ifname#192.168.0.10$ looking for service myfriend-1
>>>>>>>> Processor 1 connecting to
>>>>>> 'tag#0$port#53995$description#192.168.0.10
>>>>>>>> $ifname#192.168.0.10$'
>>>>>>>> Processor 1 connected
>>>>>>>> Processor 1 remote comm size is 1
>>>>>>>> Processor 1 sending data through intercomm to rank 0...
>>>>>>>> Processor 0 connected
>>>>>>>> Processor 0 remote comm size is 1
>>>>>>>> Processor 0 sending data through intercomm to rank 0...
>>>>>>>> Processor 1 data sent!
>>>>>>>> Processor 0 data sent!
>>>>>>>> Processor 0 received string data 'ciao client' from
>> rank 0, tag 0
>>>>>>>> Processor 0 disconnecting communicator
>>>>>>>> Processor 0 finalizing
>>>>>>>>
>>>>>>>> Server trace:
>>>>>>>> Processor 0 (1456, Receiver) initialized
>>>>>>>> Processor 0 opened port
>> tag#0$port#53996$description#192.168.0.10
>>>>>>>> $ifname#192.168.0.10$
>>>>>>>> Publishing port tag#0$port#53996$description#192.168.0.10
>>>>>>>> $ifname#192.168.0.10$ as service myfriend-0
>>>>>>>> Processor 1 (1455, Receiver) initialized
>>>>>>>> Processor 1 opened port
>> tag#0$port#53995$description#192.168.0.10
>>>>>>>> $ifname#192.168.0.10$
>>>>>>>> Publishing port tag#0$port#53995$description#192.168.0.10
>>>>>>>> $ifname#192.168.0.10$ as service myfriend-1
>>>>>>>> Processor 1 waiting for connections on tag#0$port#53995
>>>>>>>> $description#192.168.0.10$ifname#192.168.0.10$, service
>>>>>> myfriend-1...
>>>>>>>> Processor 0 waiting for connections on tag#0$port#53996
>>>>>>>> $description#192.168.0.10$ifname#192.168.0.10$, service
>>>>>> myfriend-0...
>>>>>>>> Processor 0 new connection on port tag#0$port#53996
>>>>>>>> $description#192.168.0.10$ifname#192.168.0.10$
>>>>>>>> Processor 0 closing port
>>>> tag#0$port#53996$description#192.168.0.10
>>>>>>>> $ifname#192.168.0.10$
>>>>>>>> Processor 0 unpublishing service myfriend-0
>>>>>>>> Processor 0 remote comm size is 2
>>>>>>>> Processor 0 waiting for data from new intercomm...
>>>>>>>> Processor 0 got 100 elements from rank 1, tag 1: 0.000000,
>>>>>> 1.000000,
>>>>>>>> 2.000000...
>>>>>>>> Processor 0 sending string back...
>>>>>>>> Processor 0 data sent
>>>>>>>> Processor 0 disconnecting communicator
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>




More information about the mpich-discuss mailing list