[MPICH] MPICH2 does not work over Windows XP network with ib card

John Robinson jr at vertica.com
Thu Nov 3 08:52:32 CST 2005


Good morning everyone,

New to this list, so please forgive if this is not quite the right forum 
(and redirect me) - thanks!.

I have a setup where the cluster is running a long-lived server process 
that uses mpi_comm_accept to receive new client connections, and 
single-process client processes that call mpi_comm_connect to submit work.

The problem happens after the first client process has completed and a 
new one tries to connect.  I get the following fatal error from the 
server processes:

MPI_Comm_accept(116): MPI_Comm_accept(port="port#35267$description#jr$", 
MPI_INFO_NULL, root=0, comm=0x84000001, newcomm=0xbf89b370) failed
MPID_Comm_accept(29):
MPIDI_CH3_Comm_accept(598):
MPIDI_CH3I_Add_to_bizcard_cache(58): business card in cache: 
port#35268$description#jr$, business card passed: port#35269$description#jr$

The accept code looks like:

    impl->acceptComm = MPI::COMM_WORLD.Dup( ); // Intracomm for new 
connections
    MPI::Intercomm clientComm;        // result of Accept() call
    clientComm = impl->acceptComm.Accept(impl->serverPort, 
MPI_INFO_NULL, 0);

The connect is:

        MPI::Intercomm clientComm;        // result of Connect() call
        clientComm = MPI::COMM_SELF.Connect( impl->serverPort, 
MPI_INFO_NULL, 0);

One detail that may be relevant is that the client process is started on 
demand from a master process that forks the process that eventually does 
the connect().  So the original mpd job (the master process) is 
long-lived, but more than one separate forked processes are calling 
MPI_Init/Finalize.  Does each of these need to be a separate MPI job?

MPICH version: mpich2-1.0.2p1

thanks,
/jr




More information about the mpich-discuss mailing list