[MPICH] MPICH2 does not work over Windows XP network with ib card
Rajeev Thakur
thakur at mcs.anl.gov
Thu Nov 3 12:44:20 CST 2005
John,
In our upcoming release (in a couple of weeks) we do things quite
differently in this part of the code, and hopefully this problem will not be
there. In the meanwhile, if you can send us a small test program, we would
be happy to test it with the new code.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of John Robinson
> Sent: Thursday, November 03, 2005 8:53 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] MPICH2 does not work over Windows XP network
> with ib card
>
> Good morning everyone,
>
> New to this list, so please forgive if this is not quite the
> right forum
> (and redirect me) - thanks!.
>
> I have a setup where the cluster is running a long-lived
> server process
> that uses mpi_comm_accept to receive new client connections, and
> single-process client processes that call mpi_comm_connect to
> submit work.
>
> The problem happens after the first client process has
> completed and a
> new one tries to connect. I get the following fatal error from the
> server processes:
>
> MPI_Comm_accept(116):
> MPI_Comm_accept(port="port#35267$description#jr$",
> MPI_INFO_NULL, root=0, comm=0x84000001, newcomm=0xbf89b370) failed
> MPID_Comm_accept(29):
> MPIDI_CH3_Comm_accept(598):
> MPIDI_CH3I_Add_to_bizcard_cache(58): business card in cache:
> port#35268$description#jr$, business card passed:
> port#35269$description#jr$
>
> The accept code looks like:
>
> impl->acceptComm = MPI::COMM_WORLD.Dup( ); // Intracomm for new
> connections
> MPI::Intercomm clientComm; // result of Accept() call
> clientComm = impl->acceptComm.Accept(impl->serverPort,
> MPI_INFO_NULL, 0);
>
> The connect is:
>
> MPI::Intercomm clientComm; // result of Connect() call
> clientComm = MPI::COMM_SELF.Connect( impl->serverPort,
> MPI_INFO_NULL, 0);
>
> One detail that may be relevant is that the client process is
> started on
> demand from a master process that forks the process that
> eventually does
> the connect(). So the original mpd job (the master process) is
> long-lived, but more than one separate forked processes are calling
> MPI_Init/Finalize. Does each of these need to be a separate MPI job?
>
> MPICH version: mpich2-1.0.2p1
>
> thanks,
> /jr
>
>
More information about the mpich-discuss
mailing list