[mpich-discuss] Processor hangs on MPI_Comm_accept

Rajeev Thakur thakur at mcs.anl.gov
Wed May 7 14:26:50 CDT 2008


If you use MPI_COMM_SELF on the server side and MPI_COMM_WORLD on the client
side, it won't work. Using MPI_COMM_WORLD in MPI_Comm_connect makes it one
collective connect. The port name passed on the non-root rank is ignored. If
you use MPI_COMM_SELF on both sides, it becomes two separate connects that
match the two separate accepts.

Rajeev
 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto 
> Giannetti
> Sent: Wednesday, May 07, 2008 1:41 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] Processor hangs on MPI_Comm_accept
> 
> Resending with regular file attachments.
> 
> I have two simple programs that connect through a MPI_Connect/ 
> MPI_Accept scheme.
> client.c look for a service published by server.c, connects and  
> create an intercomm to send a message, then waits for a message from  
> the server and finally disconnects from the intercomm. I run both  
> client and server with 2 processors:
> 
> mpiexec -n 2 server
> mpiexec -n 2 client myfriend
> 
> Everything works if I call both MPI_Comm_connect and MPI_Comm_accept  
> using the MPI_COMM_SELF group. However, if I use MPI_COMM_WORLD on  
> the MPI_Connect() client call, one of the two server processors  
> hangs. My trace shows that the client connects to the server, 
> but the  
> server never leaves MPI_Comm_accept():
> 
> Client trace:
> Processor 0 (1463, Sender) initialized
> Processor 0 looking for service myfriend-0
> Processor 1 (1462, Sender) initialized
> Processor 1 looking for service myfriend-1
> Processor 0 found port tag#0$port#53996$description#192.168.0.10 
> $ifname#192.168.0.10$ looking for service myfriend-0
> Processor 0 connecting to 'tag#0$port#53996$description#192.168.0.10 
> $ifname#192.168.0.10$'
> Processor 1 found port tag#0$port#53995$description#192.168.0.10 
> $ifname#192.168.0.10$ looking for service myfriend-1
> Processor 1 connecting to 'tag#0$port#53995$description#192.168.0.10 
> $ifname#192.168.0.10$'
> Processor 1 connected
> Processor 1 remote comm size is 1
> Processor 1 sending data through intercomm to rank 0...
> Processor 0 connected
> Processor 0 remote comm size is 1
> Processor 0 sending data through intercomm to rank 0...
> Processor 1 data sent!
> Processor 0 data sent!
> Processor 0 received string data 'ciao client' from rank 0, tag 0
> Processor 0 disconnecting communicator
> Processor 0 finalizing
> 
> Server trace:
> Processor 0 (1456, Receiver) initialized
> Processor 0 opened port tag#0$port#53996$description#192.168.0.10 
> $ifname#192.168.0.10$
> Publishing port tag#0$port#53996$description#192.168.0.10 
> $ifname#192.168.0.10$ as service myfriend-0
> Processor 1 (1455, Receiver) initialized
> Processor 1 opened port tag#0$port#53995$description#192.168.0.10 
> $ifname#192.168.0.10$
> Publishing port tag#0$port#53995$description#192.168.0.10 
> $ifname#192.168.0.10$ as service myfriend-1
> Processor 1 waiting for connections on tag#0$port#53995 
> $description#192.168.0.10$ifname#192.168.0.10$, service myfriend-1...
> Processor 0 waiting for connections on tag#0$port#53996 
> $description#192.168.0.10$ifname#192.168.0.10$, service myfriend-0...
> Processor 0 new connection on port tag#0$port#53996 
> $description#192.168.0.10$ifname#192.168.0.10$
> Processor 0 closing port tag#0$port#53996$description#192.168.0.10 
> $ifname#192.168.0.10$
> Processor 0 unpublishing service myfriend-0
> Processor 0 remote comm size is 2
> Processor 0 waiting for data from new intercomm...
> Processor 0 got 100 elements from rank 1, tag 1: 0.000000, 1.000000,  
> 2.000000...
> Processor 0 sending string back...
> Processor 0 data sent
> Processor 0 disconnecting communicator
> 
> 
> 




More information about the mpich-discuss mailing list