[mpich-discuss] Processor hangs on MPI_Comm_accept

Rajeev Thakur thakur at mcs.anl.gov
Wed May 7 18:26:57 CDT 2008


That's an invalid program. There is no rank 1 in MPI_COMM_SELF. See the
definition of MPI_Comm_accept:

IN root	rank in comm of root node 
IN comm	intracommunicator over which call is collective


> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto 
> Giannetti
> Sent: Wednesday, May 07, 2008 6:11 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
> 
> I run the server program with this accept call:
> 
>    if( MPI_Comm_accept(myport, MPI_INFO_NULL, 1, MPI_COMM_SELF,  
> &intercomm) != MPI_SUCCESS ) {
>      printf("Processor %d, Error on MPI_Comm_accept\n", rank);
>      MPI_Finalize();
>      exit(-1);
>    }
> 
> Run mpiexec with 1 processor:
> 
> $ mpiexec -np 1 ./server
> Processor 0 (2670, Receiver) initialized
> Processor 0 opened port tag#0$port#56897$description#192.168.0.10 
> $ifname#192.168.0.10$
> Publishing port tag#0$port#56897$description#192.168.0.10 
> $ifname#192.168.0.10$ as service myfriend-0
> Processor 0 waiting for connections on tag#0$port#56897 
> $description#192.168.0.10$ifname#192.168.0.10$, service myfriend-0...
> Processor 0 new connection on port tag#0$port#56897 
> $description#192.168.0.10$ifname#192.168.0.10$
> Processor 0 closing port tag#0$port#56897$description#192.168.0.10 
> $ifname#192.168.0.10$
> Processor 0 unpublishing service myfriend-0
> Processor 0 remote comm size is 0
> Processor 0 waiting for data from new intercomm...
> 
> No client program is running. How does the server receives 
> and accept  
> an incoming connection?
> 
> 
> On May 7, 2008, at 6:30 PM, Rajeev Thakur wrote:
> 
> > The rank is always relative to the communicator. If the  
> > communicator is
> > comm_self, the root rank cannot be other than 0 because comm_self  
> > has no
> > rank 1.
> >
> > See 
> http://www.mpi-forum.org/docs/mpi-20-html/node103.htm#Node103 for
> > information about the routines.
> >
> >> -----Original Message-----
> >> From: owner-mpich-discuss at mcs.anl.gov
> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
> >> Giannetti
> >> Sent: Wednesday, May 07, 2008 5:20 PM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
> >>
> >> I changed the Accept root parameter from 0 to the rank of the
> >> processor ('rank' variable), group MPI_COMM_SELF. Right after I run
> >> the server, processor 1 receives an incoming connection,
> >> although the
> >> client is not running. Why is a connection established.
> >>
> >> I am confused about the root parameters in Connect/Accept. Would
> >> appreciate some more detailed information.
> >>
> >> On May 7, 2008, at 5:07 PM, Rajeev Thakur wrote:
> >>
> >>> MPI_Comm_connect with comm_world succeeds on both clients because
> >>> it is
> >>> matched by the MPI_Comm_accept with comm_self on rank 0 of the
> >>> servers.
> >>> There is nothing to match the Accept with comm_self on the server
> >>> with rank
> >>> 1, so it hangs. If you used comm_world on the servers as well,
> >>> everything
> >>> would work again.
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: owner-mpich-discuss at mcs.anl.gov
> >>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
> >>>> Giannetti
> >>>> Sent: Wednesday, May 07, 2008 3:34 PM
> >>>> To: mpich-discuss at mcs.anl.gov
> >>>> Subject: Re: [mpich-discuss] Processor hangs on MPI_Comm_accept
> >>>>
> >>>>
> >>>> On May 7, 2008, at 3:26 PM, Rajeev Thakur wrote:
> >>>>
> >>>>> If you use MPI_COMM_SELF on the server side and 
> MPI_COMM_WORLD on
> >>>>> the client
> >>>>> side, it won't work. Using MPI_COMM_WORLD in
> >>>> MPI_Comm_connect makes
> >>>>> it one
> >>>>> collective connect. The port name passed on the non-root rank is
> >>>>> ignored.
> >>>>
> >>>> The MPI_Comm_connect succeeds for both processors.
> >>>> Why a client collective connect would not work?
> >>>>
> >>>>> If
> >>>>> you use MPI_COMM_SELF on both sides, it becomes two separate
> >>>>> connects that
> >>>>> match the two separate accepts.
> >>>>>
> >>>>> Rajeev
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: owner-mpich-discuss at mcs.anl.gov
> >>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Alberto
> >>>>>> Giannetti
> >>>>>> Sent: Wednesday, May 07, 2008 1:41 PM
> >>>>>> To: mpich-discuss at mcs.anl.gov
> >>>>>> Subject: [mpich-discuss] Processor hangs on MPI_Comm_accept
> >>>>>>
> >>>>>> Resending with regular file attachments.
> >>>>>>
> >>>>>> I have two simple programs that connect through a MPI_Connect/
> >>>>>> MPI_Accept scheme.
> >>>>>> client.c look for a service published by server.c, connects and
> >>>>>> create an intercomm to send a message, then waits for a
> >>>> message from
> >>>>>> the server and finally disconnects from the intercomm. 
> I run both
> >>>>>> client and server with 2 processors:
> >>>>>>
> >>>>>> mpiexec -n 2 server
> >>>>>> mpiexec -n 2 client myfriend
> >>>>>>
> >>>>>> Everything works if I call both MPI_Comm_connect and
> >>>> MPI_Comm_accept
> >>>>>> using the MPI_COMM_SELF group. However, if I use
> >> MPI_COMM_WORLD on
> >>>>>> the MPI_Connect() client call, one of the two server processors
> >>>>>> hangs. My trace shows that the client connects to the server,
> >>>>>> but the
> >>>>>> server never leaves MPI_Comm_accept():
> >>>>>>
> >>>>>> Client trace:
> >>>>>> Processor 0 (1463, Sender) initialized
> >>>>>> Processor 0 looking for service myfriend-0
> >>>>>> Processor 1 (1462, Sender) initialized
> >>>>>> Processor 1 looking for service myfriend-1
> >>>>>> Processor 0 found port 
> tag#0$port#53996$description#192.168.0.10
> >>>>>> $ifname#192.168.0.10$ looking for service myfriend-0
> >>>>>> Processor 0 connecting to
> >>>> 'tag#0$port#53996$description#192.168.0.10
> >>>>>> $ifname#192.168.0.10$'
> >>>>>> Processor 1 found port 
> tag#0$port#53995$description#192.168.0.10
> >>>>>> $ifname#192.168.0.10$ looking for service myfriend-1
> >>>>>> Processor 1 connecting to
> >>>> 'tag#0$port#53995$description#192.168.0.10
> >>>>>> $ifname#192.168.0.10$'
> >>>>>> Processor 1 connected
> >>>>>> Processor 1 remote comm size is 1
> >>>>>> Processor 1 sending data through intercomm to rank 0...
> >>>>>> Processor 0 connected
> >>>>>> Processor 0 remote comm size is 1
> >>>>>> Processor 0 sending data through intercomm to rank 0...
> >>>>>> Processor 1 data sent!
> >>>>>> Processor 0 data sent!
> >>>>>> Processor 0 received string data 'ciao client' from 
> rank 0, tag 0
> >>>>>> Processor 0 disconnecting communicator
> >>>>>> Processor 0 finalizing
> >>>>>>
> >>>>>> Server trace:
> >>>>>> Processor 0 (1456, Receiver) initialized
> >>>>>> Processor 0 opened port 
> tag#0$port#53996$description#192.168.0.10
> >>>>>> $ifname#192.168.0.10$
> >>>>>> Publishing port tag#0$port#53996$description#192.168.0.10
> >>>>>> $ifname#192.168.0.10$ as service myfriend-0
> >>>>>> Processor 1 (1455, Receiver) initialized
> >>>>>> Processor 1 opened port 
> tag#0$port#53995$description#192.168.0.10
> >>>>>> $ifname#192.168.0.10$
> >>>>>> Publishing port tag#0$port#53995$description#192.168.0.10
> >>>>>> $ifname#192.168.0.10$ as service myfriend-1
> >>>>>> Processor 1 waiting for connections on tag#0$port#53995
> >>>>>> $description#192.168.0.10$ifname#192.168.0.10$, service
> >>>> myfriend-1...
> >>>>>> Processor 0 waiting for connections on tag#0$port#53996
> >>>>>> $description#192.168.0.10$ifname#192.168.0.10$, service
> >>>> myfriend-0...
> >>>>>> Processor 0 new connection on port tag#0$port#53996
> >>>>>> $description#192.168.0.10$ifname#192.168.0.10$
> >>>>>> Processor 0 closing port
> >> tag#0$port#53996$description#192.168.0.10
> >>>>>> $ifname#192.168.0.10$
> >>>>>> Processor 0 unpublishing service myfriend-0
> >>>>>> Processor 0 remote comm size is 2
> >>>>>> Processor 0 waiting for data from new intercomm...
> >>>>>> Processor 0 got 100 elements from rank 1, tag 1: 0.000000,
> >>>> 1.000000,
> >>>>>> 2.000000...
> >>>>>> Processor 0 sending string back...
> >>>>>> Processor 0 data sent
> >>>>>> Processor 0 disconnecting communicator
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >
> 
> 
> 




More information about the mpich-discuss mailing list