[MPICH] Accept failure from forked process

John Robinson jr at vertica.com
Thu Nov 3 09:22:21 CST 2005


Okay, fixed the subject line.

Also, see additional comment below about the forked processes...

John Robinson wrote:
> Good morning everyone,
> 
> New to this list, so please forgive if this is not quite the right forum 
> (and redirect me) - thanks!.
> 
> I have a setup where the cluster is running a long-lived server process 
> that uses mpi_comm_accept to receive new client connections, and 
> single-process client processes that call mpi_comm_connect to submit work.
> 
> The problem happens after the first client process has completed and a 
> new one tries to connect.  I get the following fatal error from the 
> server processes:
> 
> MPI_Comm_accept(116): MPI_Comm_accept(port="port#35267$description#jr$", 
> MPI_INFO_NULL, root=0, comm=0x84000001, newcomm=0xbf89b370) failed
> MPID_Comm_accept(29):
> MPIDI_CH3_Comm_accept(598):
> MPIDI_CH3I_Add_to_bizcard_cache(58): business card in cache: 
> port#35268$description#jr$, business card passed: 
> port#35269$description#jr$
> 
> The accept code looks like:
> 
>    impl->acceptComm = MPI::COMM_WORLD.Dup( ); // Intracomm for new 
> connections
>    MPI::Intercomm clientComm;        // result of Accept() call
>    clientComm = impl->acceptComm.Accept(impl->serverPort, MPI_INFO_NULL, 
> 0);
> 
> The connect is:
> 
>        MPI::Intercomm clientComm;        // result of Connect() call
>        clientComm = MPI::COMM_SELF.Connect( impl->serverPort, 
> MPI_INFO_NULL, 0);
> 
> One detail that may be relevant is that the client process is started on 
> demand from a master process that forks the process that eventually does 
> the connect().  So the original mpd job (the master process) is 
> long-lived, but more than one separate forked processes are calling 
> MPI_Init/Finalize.  Does each of these need to be a separate MPI job?

If I stop and restart the master process between connects, the problem 
does not happen.

/jr
---
> MPICH version: mpich2-1.0.2p1
> 
> thanks,
> /jr
> 




More information about the mpich-discuss mailing list