[mpich-discuss] Processor hangs on MPI_Comm_accept

Rajeev Thakur thakur at mcs.anl.gov
Wed May 7 16:07:03 CDT 2008


MPI_Comm_connect with comm_world succeeds on both clients because it is
matched by the MPI_Comm_accept with comm_self on rank 0 of the servers.
There is nothing to match the Accept with comm_self on the server with rank
1, so it hangs. If you used comm_world on the servers as well, everything
would work again.
 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto 
> Giannetti
> Sent: Wednesday, May 07, 2008 3:34 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
> 
> 
> On May 7, 2008, at 3:26 PM, Rajeev Thakur wrote:
> 
> > If you use MPI_COMM_SELF on the server side and MPI_COMM_WORLD on  
> > the client
> > side, it won't work. Using MPI_COMM_WORLD in 
> MPI_Comm_connect makes  
> > it one
> > collective connect. The port name passed on the non-root rank is  
> > ignored.
> 
> The MPI_Comm_connect succeeds for both processors.
> Why a client collective connect would not work?
> 
> > If
> > you use MPI_COMM_SELF on both sides, it becomes two separate  
> > connects that
> > match the two separate accepts.
> >
> > Rajeev
> >
> >
> >> -----Original Message-----
> >> From: owner-mpich-discuss at mcs.anl.gov
> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
> >> Giannetti
> >> Sent: Wednesday, May 07, 2008 1:41 PM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: [mpich-discuss] Processor hangs on MPI_Comm_accept
> >>
> >> Resending with regular file attachments.
> >>
> >> I have two simple programs that connect through a MPI_Connect/
> >> MPI_Accept scheme.
> >> client.c look for a service published by server.c, connects and
> >> create an intercomm to send a message, then waits for a 
> message from
> >> the server and finally disconnects from the intercomm. I run both
> >> client and server with 2 processors:
> >>
> >> mpiexec -n 2 server
> >> mpiexec -n 2 client myfriend
> >>
> >> Everything works if I call both MPI_Comm_connect and 
> MPI_Comm_accept
> >> using the MPI_COMM_SELF group. However, if I use MPI_COMM_WORLD on
> >> the MPI_Connect() client call, one of the two server processors
> >> hangs. My trace shows that the client connects to the server,
> >> but the
> >> server never leaves MPI_Comm_accept():
> >>
> >> Client trace:
> >> Processor 0 (1463, Sender) initialized
> >> Processor 0 looking for service myfriend-0
> >> Processor 1 (1462, Sender) initialized
> >> Processor 1 looking for service myfriend-1
> >> Processor 0 found port tag#0$port#53996$description#192.168.0.10
> >> $ifname#192.168.0.10$ looking for service myfriend-0
> >> Processor 0 connecting to 
> 'tag#0$port#53996$description#192.168.0.10
> >> $ifname#192.168.0.10$'
> >> Processor 1 found port tag#0$port#53995$description#192.168.0.10
> >> $ifname#192.168.0.10$ looking for service myfriend-1
> >> Processor 1 connecting to 
> 'tag#0$port#53995$description#192.168.0.10
> >> $ifname#192.168.0.10$'
> >> Processor 1 connected
> >> Processor 1 remote comm size is 1
> >> Processor 1 sending data through intercomm to rank 0...
> >> Processor 0 connected
> >> Processor 0 remote comm size is 1
> >> Processor 0 sending data through intercomm to rank 0...
> >> Processor 1 data sent!
> >> Processor 0 data sent!
> >> Processor 0 received string data 'ciao client' from rank 0, tag 0
> >> Processor 0 disconnecting communicator
> >> Processor 0 finalizing
> >>
> >> Server trace:
> >> Processor 0 (1456, Receiver) initialized
> >> Processor 0 opened port tag#0$port#53996$description#192.168.0.10
> >> $ifname#192.168.0.10$
> >> Publishing port tag#0$port#53996$description#192.168.0.10
> >> $ifname#192.168.0.10$ as service myfriend-0
> >> Processor 1 (1455, Receiver) initialized
> >> Processor 1 opened port tag#0$port#53995$description#192.168.0.10
> >> $ifname#192.168.0.10$
> >> Publishing port tag#0$port#53995$description#192.168.0.10
> >> $ifname#192.168.0.10$ as service myfriend-1
> >> Processor 1 waiting for connections on tag#0$port#53995
> >> $description#192.168.0.10$ifname#192.168.0.10$, service 
> myfriend-1...
> >> Processor 0 waiting for connections on tag#0$port#53996
> >> $description#192.168.0.10$ifname#192.168.0.10$, service 
> myfriend-0...
> >> Processor 0 new connection on port tag#0$port#53996
> >> $description#192.168.0.10$ifname#192.168.0.10$
> >> Processor 0 closing port tag#0$port#53996$description#192.168.0.10
> >> $ifname#192.168.0.10$
> >> Processor 0 unpublishing service myfriend-0
> >> Processor 0 remote comm size is 2
> >> Processor 0 waiting for data from new intercomm...
> >> Processor 0 got 100 elements from rank 1, tag 1: 0.000000, 
> 1.000000,
> >> 2.000000...
> >> Processor 0 sending string back...
> >> Processor 0 data sent
> >> Processor 0 disconnecting communicator
> >>
> >>
> >>
> >
> 
> 
> 




More information about the mpich-discuss mailing list