[mpich-discuss] Processor hangs on MPI_Comm_accept
Alberto Giannetti
albertogiannetti at gmail.com
Wed May 7 17:19:39 CDT 2008
I changed the Accept root parameter from 0 to the rank of the
processor ('rank' variable), group MPI_COMM_SELF. Right after I run
the server, processor 1 receives an incoming connection, although the
client is not running. Why is a connection established.
I am confused about the root parameters in Connect/Accept. Would
appreciate some more detailed information.
On May 7, 2008, at 5:07 PM, Rajeev Thakur wrote:
> MPI_Comm_connect with comm_world succeeds on both clients because
> it is
> matched by the MPI_Comm_accept with comm_self on rank 0 of the
> servers.
> There is nothing to match the Accept with comm_self on the server
> with rank
> 1, so it hangs. If you used comm_world on the servers as well,
> everything
> would work again.
>
>
>> -----Original Message-----
>> From: owner-mpich-discuss at mcs.anl.gov
>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
>> Giannetti
>> Sent: Wednesday, May 07, 2008 3:34 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
>>
>>
>> On May 7, 2008, at 3:26 PM, Rajeev Thakur wrote:
>>
>>> If you use MPI_COMM_SELF on the server side and MPI_COMM_WORLD on
>>> the client
>>> side, it won't work. Using MPI_COMM_WORLD in
>> MPI_Comm_connect makes
>>> it one
>>> collective connect. The port name passed on the non-root rank is
>>> ignored.
>>
>> The MPI_Comm_connect succeeds for both processors.
>> Why a client collective connect would not work?
>>
>>> If
>>> you use MPI_COMM_SELF on both sides, it becomes two separate
>>> connects that
>>> match the two separate accepts.
>>>
>>> Rajeev
>>>
>>>
>>>> -----Original Message-----
>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
>>>> Giannetti
>>>> Sent: Wednesday, May 07, 2008 1:41 PM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: [mpich-discuss] Processor hangs on MPI_Comm_accept
>>>>
>>>> Resending with regular file attachments.
>>>>
>>>> I have two simple programs that connect through a MPI_Connect/
>>>> MPI_Accept scheme.
>>>> client.c look for a service published by server.c, connects and
>>>> create an intercomm to send a message, then waits for a
>> message from
>>>> the server and finally disconnects from the intercomm. I run both
>>>> client and server with 2 processors:
>>>>
>>>> mpiexec -n 2 server
>>>> mpiexec -n 2 client myfriend
>>>>
>>>> Everything works if I call both MPI_Comm_connect and
>> MPI_Comm_accept
>>>> using the MPI_COMM_SELF group. However, if I use MPI_COMM_WORLD on
>>>> the MPI_Connect() client call, one of the two server processors
>>>> hangs. My trace shows that the client connects to the server,
>>>> but the
>>>> server never leaves MPI_Comm_accept():
>>>>
>>>> Client trace:
>>>> Processor 0 (1463, Sender) initialized
>>>> Processor 0 looking for service myfriend-0
>>>> Processor 1 (1462, Sender) initialized
>>>> Processor 1 looking for service myfriend-1
>>>> Processor 0 found port tag#0$port#53996$description#192.168.0.10
>>>> $ifname#192.168.0.10$ looking for service myfriend-0
>>>> Processor 0 connecting to
>> 'tag#0$port#53996$description#192.168.0.10
>>>> $ifname#192.168.0.10$'
>>>> Processor 1 found port tag#0$port#53995$description#192.168.0.10
>>>> $ifname#192.168.0.10$ looking for service myfriend-1
>>>> Processor 1 connecting to
>> 'tag#0$port#53995$description#192.168.0.10
>>>> $ifname#192.168.0.10$'
>>>> Processor 1 connected
>>>> Processor 1 remote comm size is 1
>>>> Processor 1 sending data through intercomm to rank 0...
>>>> Processor 0 connected
>>>> Processor 0 remote comm size is 1
>>>> Processor 0 sending data through intercomm to rank 0...
>>>> Processor 1 data sent!
>>>> Processor 0 data sent!
>>>> Processor 0 received string data 'ciao client' from rank 0, tag 0
>>>> Processor 0 disconnecting communicator
>>>> Processor 0 finalizing
>>>>
>>>> Server trace:
>>>> Processor 0 (1456, Receiver) initialized
>>>> Processor 0 opened port tag#0$port#53996$description#192.168.0.10
>>>> $ifname#192.168.0.10$
>>>> Publishing port tag#0$port#53996$description#192.168.0.10
>>>> $ifname#192.168.0.10$ as service myfriend-0
>>>> Processor 1 (1455, Receiver) initialized
>>>> Processor 1 opened port tag#0$port#53995$description#192.168.0.10
>>>> $ifname#192.168.0.10$
>>>> Publishing port tag#0$port#53995$description#192.168.0.10
>>>> $ifname#192.168.0.10$ as service myfriend-1
>>>> Processor 1 waiting for connections on tag#0$port#53995
>>>> $description#192.168.0.10$ifname#192.168.0.10$, service
>> myfriend-1...
>>>> Processor 0 waiting for connections on tag#0$port#53996
>>>> $description#192.168.0.10$ifname#192.168.0.10$, service
>> myfriend-0...
>>>> Processor 0 new connection on port tag#0$port#53996
>>>> $description#192.168.0.10$ifname#192.168.0.10$
>>>> Processor 0 closing port tag#0$port#53996$description#192.168.0.10
>>>> $ifname#192.168.0.10$
>>>> Processor 0 unpublishing service myfriend-0
>>>> Processor 0 remote comm size is 2
>>>> Processor 0 waiting for data from new intercomm...
>>>> Processor 0 got 100 elements from rank 1, tag 1: 0.000000,
>> 1.000000,
>>>> 2.000000...
>>>> Processor 0 sending string back...
>>>> Processor 0 data sent
>>>> Processor 0 disconnecting communicator
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
More information about the mpich-discuss
mailing list