[MPICH] MPICH2 does not work over Windows XP network with ib card

Rajeev Thakur thakur at mcs.anl.gov
Fri Nov 4 14:54:27 CST 2005


Yes, your test runs successfully to completion with our latest source. 

Rajeev

PS: BTW, the preferred method of running is "mpiexec -n 1" instead of
"mpdrun -np 1"
 

> -----Original Message-----
> From: John Robinson [mailto:jr at vertica.com] 
> Sent: Thursday, November 03, 2005 3:51 PM
> To: Rajeev Thakur
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] MPICH2 does not work over Windows XP 
> network with ib card
> 
> Hi Rajeev,
> 
> Thanks for the quick offer.
> 
> I was able to induce the failure with the attached test program pair. 
> Unzip/untar and read README.  Should be pretty transparent.
> 
> By the way, the system I am running on is:
> 
>    Linux 2.6.13-1.1532_FC4 #1
> 
> Thanks for your help!
> /jr
> --
> Rajeev Thakur wrote:
> > John,
> >      In our upcoming release (in a couple of weeks) we do 
> things quite
> > differently in this part of the code, and hopefully this 
> problem will not be
> > there. In the meanwhile, if you can send us a small test 
> program, we would
> > be happy to test it with the new code.
> > 
> > Rajeev 
> > 
> > 
> >>-----Original Message-----
> >>From: owner-mpich-discuss at mcs.anl.gov 
> >>[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of John Robinson
> >>Sent: Thursday, November 03, 2005 8:53 AM
> >>To: mpich-discuss at mcs.anl.gov
> >>Subject: [MPICH] MPICH2 does not work over Windows XP network 
> >>with ib card
> >>
> >>Good morning everyone,
> >>
> >>New to this list, so please forgive if this is not quite the 
> >>right forum 
> >>(and redirect me) - thanks!.
> >>
> >>I have a setup where the cluster is running a long-lived 
> >>server process 
> >>that uses mpi_comm_accept to receive new client connections, and 
> >>single-process client processes that call mpi_comm_connect to 
> >>submit work.
> >>
> >>The problem happens after the first client process has 
> >>completed and a 
> >>new one tries to connect.  I get the following fatal error from the 
> >>server processes:
> >>
> >>MPI_Comm_accept(116): 
> >>MPI_Comm_accept(port="port#35267$description#jr$", 
> >>MPI_INFO_NULL, root=0, comm=0x84000001, newcomm=0xbf89b370) failed
> >>MPID_Comm_accept(29):
> >>MPIDI_CH3_Comm_accept(598):
> >>MPIDI_CH3I_Add_to_bizcard_cache(58): business card in cache: 
> >>port#35268$description#jr$, business card passed: 
> >>port#35269$description#jr$
> >>
> >>The accept code looks like:
> >>
> >>    impl->acceptComm = MPI::COMM_WORLD.Dup( ); // Intracomm for new 
> >>connections
> >>    MPI::Intercomm clientComm;        // result of Accept() call
> >>    clientComm = impl->acceptComm.Accept(impl->serverPort, 
> >>MPI_INFO_NULL, 0);
> >>
> >>The connect is:
> >>
> >>        MPI::Intercomm clientComm;        // result of 
> Connect() call
> >>        clientComm = MPI::COMM_SELF.Connect( impl->serverPort, 
> >>MPI_INFO_NULL, 0);
> >>
> >>One detail that may be relevant is that the client process is 
> >>started on 
> >>demand from a master process that forks the process that 
> >>eventually does 
> >>the connect().  So the original mpd job (the master process) is 
> >>long-lived, but more than one separate forked processes are calling 
> >>MPI_Init/Finalize.  Does each of these need to be a 
> separate MPI job?
> >>
> >>MPICH version: mpich2-1.0.2p1
> >>
> >>thanks,
> >>/jr
> >>
> >>
> > 
> > 
> 




More information about the mpich-discuss mailing list