[MPICH] MPICH2 does not work over Windows XP network with ib card

John Robinson jr at vertica.com
Thu Nov 3 15:51:12 CST 2005


Hi Rajeev,

Thanks for the quick offer.

I was able to induce the failure with the attached test program pair. 
Unzip/untar and read README.  Should be pretty transparent.

By the way, the system I am running on is:

   Linux 2.6.13-1.1532_FC4 #1

Thanks for your help!
/jr
--
Rajeev Thakur wrote:
> John,
>      In our upcoming release (in a couple of weeks) we do things quite
> differently in this part of the code, and hopefully this problem will not be
> there. In the meanwhile, if you can send us a small test program, we would
> be happy to test it with the new code.
> 
> Rajeev 
> 
> 
>>-----Original Message-----
>>From: owner-mpich-discuss at mcs.anl.gov 
>>[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of John Robinson
>>Sent: Thursday, November 03, 2005 8:53 AM
>>To: mpich-discuss at mcs.anl.gov
>>Subject: [MPICH] MPICH2 does not work over Windows XP network 
>>with ib card
>>
>>Good morning everyone,
>>
>>New to this list, so please forgive if this is not quite the 
>>right forum 
>>(and redirect me) - thanks!.
>>
>>I have a setup where the cluster is running a long-lived 
>>server process 
>>that uses mpi_comm_accept to receive new client connections, and 
>>single-process client processes that call mpi_comm_connect to 
>>submit work.
>>
>>The problem happens after the first client process has 
>>completed and a 
>>new one tries to connect.  I get the following fatal error from the 
>>server processes:
>>
>>MPI_Comm_accept(116): 
>>MPI_Comm_accept(port="port#35267$description#jr$", 
>>MPI_INFO_NULL, root=0, comm=0x84000001, newcomm=0xbf89b370) failed
>>MPID_Comm_accept(29):
>>MPIDI_CH3_Comm_accept(598):
>>MPIDI_CH3I_Add_to_bizcard_cache(58): business card in cache: 
>>port#35268$description#jr$, business card passed: 
>>port#35269$description#jr$
>>
>>The accept code looks like:
>>
>>    impl->acceptComm = MPI::COMM_WORLD.Dup( ); // Intracomm for new 
>>connections
>>    MPI::Intercomm clientComm;        // result of Accept() call
>>    clientComm = impl->acceptComm.Accept(impl->serverPort, 
>>MPI_INFO_NULL, 0);
>>
>>The connect is:
>>
>>        MPI::Intercomm clientComm;        // result of Connect() call
>>        clientComm = MPI::COMM_SELF.Connect( impl->serverPort, 
>>MPI_INFO_NULL, 0);
>>
>>One detail that may be relevant is that the client process is 
>>started on 
>>demand from a master process that forks the process that 
>>eventually does 
>>the connect().  So the original mpd job (the master process) is 
>>long-lived, but more than one separate forked processes are calling 
>>MPI_Init/Finalize.  Does each of these need to be a separate MPI job?
>>
>>MPICH version: mpich2-1.0.2p1
>>
>>thanks,
>>/jr
>>
>>
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.tgz
Type: application/x-compressed-tar
Size: 2019 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20051103/e354d0af/attachment.bin>


More information about the mpich-discuss mailing list