[mpich-discuss] Processor hangs on MPI_Comm_accept
Rajeev Thakur
thakur at mcs.anl.gov
Wed May 7 14:26:50 CDT 2008
If you use MPI_COMM_SELF on the server side and MPI_COMM_WORLD on the client
side, it won't work. Using MPI_COMM_WORLD in MPI_Comm_connect makes it one
collective connect. The port name passed on the non-root rank is ignored. If
you use MPI_COMM_SELF on both sides, it becomes two separate connects that
match the two separate accepts.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
> Giannetti
> Sent: Wednesday, May 07, 2008 1:41 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] Processor hangs on MPI_Comm_accept
>
> Resending with regular file attachments.
>
> I have two simple programs that connect through a MPI_Connect/
> MPI_Accept scheme.
> client.c look for a service published by server.c, connects and
> create an intercomm to send a message, then waits for a message from
> the server and finally disconnects from the intercomm. I run both
> client and server with 2 processors:
>
> mpiexec -n 2 server
> mpiexec -n 2 client myfriend
>
> Everything works if I call both MPI_Comm_connect and MPI_Comm_accept
> using the MPI_COMM_SELF group. However, if I use MPI_COMM_WORLD on
> the MPI_Connect() client call, one of the two server processors
> hangs. My trace shows that the client connects to the server,
> but the
> server never leaves MPI_Comm_accept():
>
> Client trace:
> Processor 0 (1463, Sender) initialized
> Processor 0 looking for service myfriend-0
> Processor 1 (1462, Sender) initialized
> Processor 1 looking for service myfriend-1
> Processor 0 found port tag#0$port#53996$description#192.168.0.10
> $ifname#192.168.0.10$ looking for service myfriend-0
> Processor 0 connecting to 'tag#0$port#53996$description#192.168.0.10
> $ifname#192.168.0.10$'
> Processor 1 found port tag#0$port#53995$description#192.168.0.10
> $ifname#192.168.0.10$ looking for service myfriend-1
> Processor 1 connecting to 'tag#0$port#53995$description#192.168.0.10
> $ifname#192.168.0.10$'
> Processor 1 connected
> Processor 1 remote comm size is 1
> Processor 1 sending data through intercomm to rank 0...
> Processor 0 connected
> Processor 0 remote comm size is 1
> Processor 0 sending data through intercomm to rank 0...
> Processor 1 data sent!
> Processor 0 data sent!
> Processor 0 received string data 'ciao client' from rank 0, tag 0
> Processor 0 disconnecting communicator
> Processor 0 finalizing
>
> Server trace:
> Processor 0 (1456, Receiver) initialized
> Processor 0 opened port tag#0$port#53996$description#192.168.0.10
> $ifname#192.168.0.10$
> Publishing port tag#0$port#53996$description#192.168.0.10
> $ifname#192.168.0.10$ as service myfriend-0
> Processor 1 (1455, Receiver) initialized
> Processor 1 opened port tag#0$port#53995$description#192.168.0.10
> $ifname#192.168.0.10$
> Publishing port tag#0$port#53995$description#192.168.0.10
> $ifname#192.168.0.10$ as service myfriend-1
> Processor 1 waiting for connections on tag#0$port#53995
> $description#192.168.0.10$ifname#192.168.0.10$, service myfriend-1...
> Processor 0 waiting for connections on tag#0$port#53996
> $description#192.168.0.10$ifname#192.168.0.10$, service myfriend-0...
> Processor 0 new connection on port tag#0$port#53996
> $description#192.168.0.10$ifname#192.168.0.10$
> Processor 0 closing port tag#0$port#53996$description#192.168.0.10
> $ifname#192.168.0.10$
> Processor 0 unpublishing service myfriend-0
> Processor 0 remote comm size is 2
> Processor 0 waiting for data from new intercomm...
> Processor 0 got 100 elements from rank 1, tag 1: 0.000000, 1.000000,
> 2.000000...
> Processor 0 sending string back...
> Processor 0 data sent
> Processor 0 disconnecting communicator
>
>
>
More information about the mpich-discuss
mailing list