[mpich-discuss] Processor hangs on MPI_Comm_accept

Rajeev Thakur thakur at mcs.anl.gov
Wed May 7 17:30:57 CDT 2008


The rank is always relative to the communicator. If the communicator is
comm_self, the root rank cannot be other than 0 because comm_self has no
rank 1. 

See http://www.mpi-forum.org/docs/mpi-20-html/node103.htm#Node103 for
information about the routines. 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto 
> Giannetti
> Sent: Wednesday, May 07, 2008 5:20 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
> 
> I changed the Accept root parameter from 0 to the rank of the  
> processor ('rank' variable), group MPI_COMM_SELF. Right after I run  
> the server, processor 1 receives an incoming connection, 
> although the  
> client is not running. Why is a connection established.
> 
> I am confused about the root parameters in Connect/Accept. Would  
> appreciate some more detailed information.
> 
> On May 7, 2008, at 5:07 PM, Rajeev Thakur wrote:
> 
> > MPI_Comm_connect with comm_world succeeds on both clients because  
> > it is
> > matched by the MPI_Comm_accept with comm_self on rank 0 of the  
> > servers.
> > There is nothing to match the Accept with comm_self on the server  
> > with rank
> > 1, so it hangs. If you used comm_world on the servers as well,  
> > everything
> > would work again.
> >
> >
> >> -----Original Message-----
> >> From: owner-mpich-discuss at mcs.anl.gov
> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
> >> Giannetti
> >> Sent: Wednesday, May 07, 2008 3:34 PM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
> >>
> >>
> >> On May 7, 2008, at 3:26 PM, Rajeev Thakur wrote:
> >>
> >>> If you use MPI_COMM_SELF on the server side and MPI_COMM_WORLD on
> >>> the client
> >>> side, it won't work. Using MPI_COMM_WORLD in
> >> MPI_Comm_connect makes
> >>> it one
> >>> collective connect. The port name passed on the non-root rank is
> >>> ignored.
> >>
> >> The MPI_Comm_connect succeeds for both processors.
> >> Why a client collective connect would not work?
> >>
> >>> If
> >>> you use MPI_COMM_SELF on both sides, it becomes two separate
> >>> connects that
> >>> match the two separate accepts.
> >>>
> >>> Rajeev
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: owner-mpich-discuss at mcs.anl.gov
> >>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
> >>>> Giannetti
> >>>> Sent: Wednesday, May 07, 2008 1:41 PM
> >>>> To: mpich-discuss at mcs.anl.gov
> >>>> Subject: [mpich-discuss] Processor hangs on MPI_Comm_accept
> >>>>
> >>>> Resending with regular file attachments.
> >>>>
> >>>> I have two simple programs that connect through a MPI_Connect/
> >>>> MPI_Accept scheme.
> >>>> client.c look for a service published by server.c, connects and
> >>>> create an intercomm to send a message, then waits for a
> >> message from
> >>>> the server and finally disconnects from the intercomm. I run both
> >>>> client and server with 2 processors:
> >>>>
> >>>> mpiexec -n 2 server
> >>>> mpiexec -n 2 client myfriend
> >>>>
> >>>> Everything works if I call both MPI_Comm_connect and
> >> MPI_Comm_accept
> >>>> using the MPI_COMM_SELF group. However, if I use 
> MPI_COMM_WORLD on
> >>>> the MPI_Connect() client call, one of the two server processors
> >>>> hangs. My trace shows that the client connects to the server,
> >>>> but the
> >>>> server never leaves MPI_Comm_accept():
> >>>>
> >>>> Client trace:
> >>>> Processor 0 (1463, Sender) initialized
> >>>> Processor 0 looking for service myfriend-0
> >>>> Processor 1 (1462, Sender) initialized
> >>>> Processor 1 looking for service myfriend-1
> >>>> Processor 0 found port tag#0$port#53996$description#192.168.0.10
> >>>> $ifname#192.168.0.10$ looking for service myfriend-0
> >>>> Processor 0 connecting to
> >> 'tag#0$port#53996$description#192.168.0.10
> >>>> $ifname#192.168.0.10$'
> >>>> Processor 1 found port tag#0$port#53995$description#192.168.0.10
> >>>> $ifname#192.168.0.10$ looking for service myfriend-1
> >>>> Processor 1 connecting to
> >> 'tag#0$port#53995$description#192.168.0.10
> >>>> $ifname#192.168.0.10$'
> >>>> Processor 1 connected
> >>>> Processor 1 remote comm size is 1
> >>>> Processor 1 sending data through intercomm to rank 0...
> >>>> Processor 0 connected
> >>>> Processor 0 remote comm size is 1
> >>>> Processor 0 sending data through intercomm to rank 0...
> >>>> Processor 1 data sent!
> >>>> Processor 0 data sent!
> >>>> Processor 0 received string data 'ciao client' from rank 0, tag 0
> >>>> Processor 0 disconnecting communicator
> >>>> Processor 0 finalizing
> >>>>
> >>>> Server trace:
> >>>> Processor 0 (1456, Receiver) initialized
> >>>> Processor 0 opened port tag#0$port#53996$description#192.168.0.10
> >>>> $ifname#192.168.0.10$
> >>>> Publishing port tag#0$port#53996$description#192.168.0.10
> >>>> $ifname#192.168.0.10$ as service myfriend-0
> >>>> Processor 1 (1455, Receiver) initialized
> >>>> Processor 1 opened port tag#0$port#53995$description#192.168.0.10
> >>>> $ifname#192.168.0.10$
> >>>> Publishing port tag#0$port#53995$description#192.168.0.10
> >>>> $ifname#192.168.0.10$ as service myfriend-1
> >>>> Processor 1 waiting for connections on tag#0$port#53995
> >>>> $description#192.168.0.10$ifname#192.168.0.10$, service
> >> myfriend-1...
> >>>> Processor 0 waiting for connections on tag#0$port#53996
> >>>> $description#192.168.0.10$ifname#192.168.0.10$, service
> >> myfriend-0...
> >>>> Processor 0 new connection on port tag#0$port#53996
> >>>> $description#192.168.0.10$ifname#192.168.0.10$
> >>>> Processor 0 closing port 
> tag#0$port#53996$description#192.168.0.10
> >>>> $ifname#192.168.0.10$
> >>>> Processor 0 unpublishing service myfriend-0
> >>>> Processor 0 remote comm size is 2
> >>>> Processor 0 waiting for data from new intercomm...
> >>>> Processor 0 got 100 elements from rank 1, tag 1: 0.000000,
> >> 1.000000,
> >>>> 2.000000...
> >>>> Processor 0 sending string back...
> >>>> Processor 0 data sent
> >>>> Processor 0 disconnecting communicator
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >
> 
> 
> 




More information about the mpich-discuss mailing list