[mpich-discuss] Processor hangs on MPI_Comm_accept

Alberto Giannetti albertogiannetti at gmail.com
Wed May 7 18:11:14 CDT 2008


I run the server program with this accept call:

   if( MPI_Comm_accept(myport, MPI_INFO_NULL, 1, MPI_COMM_SELF,  
&intercomm) != MPI_SUCCESS ) {
     printf("Processor %d, Error on MPI_Comm_accept\n", rank);
     MPI_Finalize();
     exit(-1);
   }

Run mpiexec with 1 processor:

$ mpiexec -np 1 ./server
Processor 0 (2670, Receiver) initialized
Processor 0 opened port tag#0$port#56897$description#192.168.0.10 
$ifname#192.168.0.10$
Publishing port tag#0$port#56897$description#192.168.0.10 
$ifname#192.168.0.10$ as service myfriend-0
Processor 0 waiting for connections on tag#0$port#56897 
$description#192.168.0.10$ifname#192.168.0.10$, service myfriend-0...
Processor 0 new connection on port tag#0$port#56897 
$description#192.168.0.10$ifname#192.168.0.10$
Processor 0 closing port tag#0$port#56897$description#192.168.0.10 
$ifname#192.168.0.10$
Processor 0 unpublishing service myfriend-0
Processor 0 remote comm size is 0
Processor 0 waiting for data from new intercomm...

No client program is running. How does the server receives and accept  
an incoming connection?


On May 7, 2008, at 6:30 PM, Rajeev Thakur wrote:

> The rank is always relative to the communicator. If the  
> communicator is
> comm_self, the root rank cannot be other than 0 because comm_self  
> has no
> rank 1.
>
> See http://www.mpi-forum.org/docs/mpi-20-html/node103.htm#Node103 for
> information about the routines.
>
>> -----Original Message-----
>> From: owner-mpich-discuss at mcs.anl.gov
>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
>> Giannetti
>> Sent: Wednesday, May 07, 2008 5:20 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
>>
>> I changed the Accept root parameter from 0 to the rank of the
>> processor ('rank' variable), group MPI_COMM_SELF. Right after I run
>> the server, processor 1 receives an incoming connection,
>> although the
>> client is not running. Why is a connection established.
>>
>> I am confused about the root parameters in Connect/Accept. Would
>> appreciate some more detailed information.
>>
>> On May 7, 2008, at 5:07 PM, Rajeev Thakur wrote:
>>
>>> MPI_Comm_connect with comm_world succeeds on both clients because
>>> it is
>>> matched by the MPI_Comm_accept with comm_self on rank 0 of the
>>> servers.
>>> There is nothing to match the Accept with comm_self on the server
>>> with rank
>>> 1, so it hangs. If you used comm_world on the servers as well,
>>> everything
>>> would work again.
>>>
>>>
>>>> -----Original Message-----
>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
>>>> Giannetti
>>>> Sent: Wednesday, May 07, 2008 3:34 PM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
>>>>
>>>>
>>>> On May 7, 2008, at 3:26 PM, Rajeev Thakur wrote:
>>>>
>>>>> If you use MPI_COMM_SELF on the server side and MPI_COMM_WORLD on
>>>>> the client
>>>>> side, it won't work. Using MPI_COMM_WORLD in
>>>> MPI_Comm_connect makes
>>>>> it one
>>>>> collective connect. The port name passed on the non-root rank is
>>>>> ignored.
>>>>
>>>> The MPI_Comm_connect succeeds for both processors.
>>>> Why a client collective connect would not work?
>>>>
>>>>> If
>>>>> you use MPI_COMM_SELF on both sides, it becomes two separate
>>>>> connects that
>>>>> match the two separate accepts.
>>>>>
>>>>> Rajeev
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
>>>>>> Giannetti
>>>>>> Sent: Wednesday, May 07, 2008 1:41 PM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: [mpich-discuss] Processor hangs on MPI_Comm_accept
>>>>>>
>>>>>> Resending with regular file attachments.
>>>>>>
>>>>>> I have two simple programs that connect through a MPI_Connect/
>>>>>> MPI_Accept scheme.
>>>>>> client.c look for a service published by server.c, connects and
>>>>>> create an intercomm to send a message, then waits for a
>>>> message from
>>>>>> the server and finally disconnects from the intercomm. I run both
>>>>>> client and server with 2 processors:
>>>>>>
>>>>>> mpiexec -n 2 server
>>>>>> mpiexec -n 2 client myfriend
>>>>>>
>>>>>> Everything works if I call both MPI_Comm_connect and
>>>> MPI_Comm_accept
>>>>>> using the MPI_COMM_SELF group. However, if I use
>> MPI_COMM_WORLD on
>>>>>> the MPI_Connect() client call, one of the two server processors
>>>>>> hangs. My trace shows that the client connects to the server,
>>>>>> but the
>>>>>> server never leaves MPI_Comm_accept():
>>>>>>
>>>>>> Client trace:
>>>>>> Processor 0 (1463, Sender) initialized
>>>>>> Processor 0 looking for service myfriend-0
>>>>>> Processor 1 (1462, Sender) initialized
>>>>>> Processor 1 looking for service myfriend-1
>>>>>> Processor 0 found port tag#0$port#53996$description#192.168.0.10
>>>>>> $ifname#192.168.0.10$ looking for service myfriend-0
>>>>>> Processor 0 connecting to
>>>> 'tag#0$port#53996$description#192.168.0.10
>>>>>> $ifname#192.168.0.10$'
>>>>>> Processor 1 found port tag#0$port#53995$description#192.168.0.10
>>>>>> $ifname#192.168.0.10$ looking for service myfriend-1
>>>>>> Processor 1 connecting to
>>>> 'tag#0$port#53995$description#192.168.0.10
>>>>>> $ifname#192.168.0.10$'
>>>>>> Processor 1 connected
>>>>>> Processor 1 remote comm size is 1
>>>>>> Processor 1 sending data through intercomm to rank 0...
>>>>>> Processor 0 connected
>>>>>> Processor 0 remote comm size is 1
>>>>>> Processor 0 sending data through intercomm to rank 0...
>>>>>> Processor 1 data sent!
>>>>>> Processor 0 data sent!
>>>>>> Processor 0 received string data 'ciao client' from rank 0, tag 0
>>>>>> Processor 0 disconnecting communicator
>>>>>> Processor 0 finalizing
>>>>>>
>>>>>> Server trace:
>>>>>> Processor 0 (1456, Receiver) initialized
>>>>>> Processor 0 opened port tag#0$port#53996$description#192.168.0.10
>>>>>> $ifname#192.168.0.10$
>>>>>> Publishing port tag#0$port#53996$description#192.168.0.10
>>>>>> $ifname#192.168.0.10$ as service myfriend-0
>>>>>> Processor 1 (1455, Receiver) initialized
>>>>>> Processor 1 opened port tag#0$port#53995$description#192.168.0.10
>>>>>> $ifname#192.168.0.10$
>>>>>> Publishing port tag#0$port#53995$description#192.168.0.10
>>>>>> $ifname#192.168.0.10$ as service myfriend-1
>>>>>> Processor 1 waiting for connections on tag#0$port#53995
>>>>>> $description#192.168.0.10$ifname#192.168.0.10$, service
>>>> myfriend-1...
>>>>>> Processor 0 waiting for connections on tag#0$port#53996
>>>>>> $description#192.168.0.10$ifname#192.168.0.10$, service
>>>> myfriend-0...
>>>>>> Processor 0 new connection on port tag#0$port#53996
>>>>>> $description#192.168.0.10$ifname#192.168.0.10$
>>>>>> Processor 0 closing port
>> tag#0$port#53996$description#192.168.0.10
>>>>>> $ifname#192.168.0.10$
>>>>>> Processor 0 unpublishing service myfriend-0
>>>>>> Processor 0 remote comm size is 2
>>>>>> Processor 0 waiting for data from new intercomm...
>>>>>> Processor 0 got 100 elements from rank 1, tag 1: 0.000000,
>>>> 1.000000,
>>>>>> 2.000000...
>>>>>> Processor 0 sending string back...
>>>>>> Processor 0 data sent
>>>>>> Processor 0 disconnecting communicator
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>




More information about the mpich-discuss mailing list