[mpich-discuss] networking problems with mpich2-1.3

Manhui Wang wangm9 at cardiff.ac.uk
Thu Nov 11 07:00:51 CST 2010


Saurav,

I met the the similar problem before, but it was resolved by changing
/etc/hosts. See previous discussion:

http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-July/007469.html

Best wishes,
Manhui

Saurav Pathak wrote:
> Hi,
> 
> I am trying to set up two computers for running MPI.  I have compiled
> mpich2-1.3 from source, and have run the following without any incident
> on both machines (Ubuntu 9.04):
> 
> mpiexec -n 2 ./cpi
> 
> But when I try to run it on two computers (comp1 and comp2), then I run
> into the following problems.
> 
> On comp1 when I run "mpiexec -f hosts -n 2 ./cpi", I get the following
> error.
> ----
> [proxy:0:1 at comp2] HYDU_sock_connect
> (/home/saurav/local/src/mpich2-1.3/src/pm/hydra/utils/sock/sock.c:151):
> connect error (Connection timed out)
> [proxy:0:1 at comp2] main
> (/home/saurav/local/src/mpich2-1.3/src/pm/hydra/pm/pmiserv/pmip.c:204):
> unable to connect to server comp1 at port 38203 (check for firewalls!)
> ----
> 
> I have checked for firewalls via "sudo  /sbin/iptables -L" on both
> computers, and there are no firewalls.
> 
> When I execute "mpiexec -f hosts -n 2 ./cpi" on comp2, I get the
> following output:
> ----
> Process 1 of 2 is on comp2
> Process 0 of 2 is on comp1
> Fatal error in PMPI_Bcast: Other MPI error, error stack:
> PMPI_Bcast(1306)..................: MPI_Bcast(buf=0x7ffff6b7465c,
> count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed
> MPIR_Bcast_impl(1150).............:
> MPIR_Bcast_intra(1021)............:
> MPIR_Bcast_binomial(187)..........:
> MPIC_Send(66).....................:
> MPIC_Wait(528)....................:
> MPIDI_CH3I_Progress(333)..........:
> MPID_nem_mpich2_blocking_recv(906):
> MPID_nem_tcp_connpoll(1861).......: Communication error with rank 1:
> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
> ----
> 
> It looks like a networking issue, but I can't figure out what it is.  I
> can ssh and rsh from one machine to the other without the need for a
> password and run commands (e.g, I can run "ssh comp2 hostname" on comp1
> and vice versa).
> 
> I seem to be at a dead end.  Any help on this issue will be greatlt
> appreciated.
> 
> Saurav
> 
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss




More information about the mpich-discuss mailing list