[mpich-discuss] networking problems with mpich2-1.3
Saurav Pathak
saurav at sas.upenn.edu
Thu Nov 11 02:37:25 CST 2010
Hi,
I am trying to set up two computers for running MPI. I have compiled
mpich2-1.3 from source, and have run the following without any incident
on both machines (Ubuntu 9.04):
mpiexec -n 2 ./cpi
But when I try to run it on two computers (comp1 and comp2), then I run
into the following problems.
On comp1 when I run "mpiexec -f hosts -n 2 ./cpi", I get the following
error.
----
[proxy:0:1 at comp2] HYDU_sock_connect
(/home/saurav/local/src/mpich2-1.3/src/pm/hydra/utils/sock/sock.c:151):
connect error (Connection timed out)
[proxy:0:1 at comp2] main
(/home/saurav/local/src/mpich2-1.3/src/pm/hydra/pm/pmiserv/pmip.c:204):
unable to connect to server comp1 at port 38203 (check for firewalls!)
----
I have checked for firewalls via "sudo /sbin/iptables -L" on both
computers, and there are no firewalls.
When I execute "mpiexec -f hosts -n 2 ./cpi" on comp2, I get the
following output:
----
Process 1 of 2 is on comp2
Process 0 of 2 is on comp1
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1306)..................: MPI_Bcast(buf=0x7ffff6b7465c,
count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1150).............:
MPIR_Bcast_intra(1021)............:
MPIR_Bcast_binomial(187)..........:
MPIC_Send(66).....................:
MPIC_Wait(528)....................:
MPIDI_CH3I_Progress(333)..........:
MPID_nem_mpich2_blocking_recv(906):
MPID_nem_tcp_connpoll(1861).......: Communication error with rank 1:
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
----
It looks like a networking issue, but I can't figure out what it is. I
can ssh and rsh from one machine to the other without the need for a
password and run commands (e.g, I can run "ssh comp2 hostname" on comp1
and vice versa).
I seem to be at a dead end. Any help on this issue will be greatlt
appreciated.
Saurav
More information about the mpich-discuss
mailing list