[MPICH] MPICH2 does not work over Windows XP network with ib card
John Robinson
jr at vertica.com
Thu Nov 3 08:52:32 CST 2005
Good morning everyone,
New to this list, so please forgive if this is not quite the right forum
(and redirect me) - thanks!.
I have a setup where the cluster is running a long-lived server process
that uses mpi_comm_accept to receive new client connections, and
single-process client processes that call mpi_comm_connect to submit work.
The problem happens after the first client process has completed and a
new one tries to connect. I get the following fatal error from the
server processes:
MPI_Comm_accept(116): MPI_Comm_accept(port="port#35267$description#jr$",
MPI_INFO_NULL, root=0, comm=0x84000001, newcomm=0xbf89b370) failed
MPID_Comm_accept(29):
MPIDI_CH3_Comm_accept(598):
MPIDI_CH3I_Add_to_bizcard_cache(58): business card in cache:
port#35268$description#jr$, business card passed: port#35269$description#jr$
The accept code looks like:
impl->acceptComm = MPI::COMM_WORLD.Dup( ); // Intracomm for new
connections
MPI::Intercomm clientComm; // result of Accept() call
clientComm = impl->acceptComm.Accept(impl->serverPort,
MPI_INFO_NULL, 0);
The connect is:
MPI::Intercomm clientComm; // result of Connect() call
clientComm = MPI::COMM_SELF.Connect( impl->serverPort,
MPI_INFO_NULL, 0);
One detail that may be relevant is that the client process is started on
demand from a master process that forks the process that eventually does
the connect(). So the original mpd job (the master process) is
long-lived, but more than one separate forked processes are calling
MPI_Init/Finalize. Does each of these need to be a separate MPI job?
MPICH version: mpich2-1.0.2p1
thanks,
/jr
More information about the mpich-discuss
mailing list