[mpich2-dev] Problem making a TCP Connection

Pavan Balaji balaji at mcs.anl.gov
Tue Nov 29 19:04:02 CST 2011


Can you try running mpiexec with the option 
-disable-hostname-propagation to see if it helps?

  -- Pavan

On 11/30/2011 08:04 AM, Cody R. Brown wrote:
> Hello;
>
> I am trying to install MPICH2 on our department machines. I can run a
> simple helloworld example (no mpi_send). However when I run an MPI
> program which requires an MPI_Send (or other TCP connection), it errors
> out with the following. The example is a simple helloworld example using
> an MPI_Send:
>
> cody$ mpiexec -n 2 -hosts host1,host2 ./networld
> Hello world (Rank: 0 / Host: host1)
> Hello world (Rank: 1 / Host: host2)
> Fatal error in MPI_Send: Other MPI error, error stack:
> MPI_Send(173)..............: MPI_Send(buf=0x7fff26b4cb80, count=50,
> MPI_CHARACTER, dest=0, tag=1, MPI_COMM_WORLD) failed
> MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection
> refused
>
>
> We have determined there is no firewall between the machines, and
> passwordless ssh is set up, ect. I can telnet into the hydra damon from
> the 2nd host. Interestingly, I can install OpenMPI, and it works fine.
> It runs fine on a single host, (even if I run it purely on the remote
> host2 from the local host1 -- it works). Just when we are using 2+ hosts
> so that it needs to make the TCP connection.
>
> For some reason MPICH2 can't seem to get the TCP connection info to make
> the TCP connect between the machines.
>
> I not too sure if there is much info you guys can give. I was just
> curious if you have seen or heard of this before. The system is an
> "openSUSE 11.4 (x86_64)". The MPICH2 version is 1.4.1p1.
>
> --
> Cody R. Brown
>    UBC Department of Computer Science
>    201-2366 Main Mall, Vancouver, BC, V6T 1Z4
>    Office: ICCS x409

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich2-dev mailing list