[mpich-discuss] networking problems with mpich2-1.3

Saurav Pathak saurav at sas.upenn.edu
Thu Nov 11 10:45:49 CST 2010


Manhui,

Thank you the solution, which partially solved my problem.  From comp2, 
I could run cpi cleanly. But from comp1, I get this message:

------------
[proxy:0:1 at comp2] HYDU_sock_connect 
(/home/saurav/local/src/mpich2-1.3/src/pm/hydra/utils/sock/sock.c:151): 
connect error (Connection timed out)
[proxy:0:1 at comp2] main 
(/home/saurav/local/src/mpich2-1.3/src/pm/hydra/pm/pmiserv/pmip.c:204): 
unable to connect to server comp1 at port 56740 (check for firewalls!)
-----------

I have looked at iptables -L, and hosts.deny on comp1, but I can't seem 
to find the problem. 

When on comp2 I try "nmap comp1" I get an error message:
Note: Host seems down. If it is really up, but blocking our ping probes, 
try -PN

When I do use the -PN option, I get exactly the same analogous result I 
get running  "nmap comp2" on comp1.  So I am guessing there is some 
network configuration on comp1 that I haven't been able to place my 
finger on.

Saurav


Manhui Wang wrote:
> Saurav,
>
> I met the the similar problem before, but it was resolved by changing
> /etc/hosts. See previous discussion:
>
> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-July/007469.html
>
> Best wishes,
> Manhui
>
> Saurav Pathak wrote:
>   
>> Hi,
>>
>> I am trying to set up two computers for running MPI.  I have compiled
>> mpich2-1.3 from source, and have run the following without any incident
>> on both machines (Ubuntu 9.04):
>>
>> mpiexec -n 2 ./cpi
>>
>> But when I try to run it on two computers (comp1 and comp2), then I run
>> into the following problems.
>>
>> On comp1 when I run "mpiexec -f hosts -n 2 ./cpi", I get the following
>> error.
>> ----
>> [proxy:0:1 at comp2] HYDU_sock_connect
>> (/home/saurav/local/src/mpich2-1.3/src/pm/hydra/utils/sock/sock.c:151):
>> connect error (Connection timed out)
>> [proxy:0:1 at comp2] main
>> (/home/saurav/local/src/mpich2-1.3/src/pm/hydra/pm/pmiserv/pmip.c:204):
>> unable to connect to server comp1 at port 38203 (check for firewalls!)
>> ----
>>
>> I have checked for firewalls via "sudo  /sbin/iptables -L" on both
>> computers, and there are no firewalls.
>>
>> When I execute "mpiexec -f hosts -n 2 ./cpi" on comp2, I get the
>> following output:
>> ----
>> Process 1 of 2 is on comp2
>> Process 0 of 2 is on comp1
>> Fatal error in PMPI_Bcast: Other MPI error, error stack:
>> PMPI_Bcast(1306)..................: MPI_Bcast(buf=0x7ffff6b7465c,
>> count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed
>> MPIR_Bcast_impl(1150).............:
>> MPIR_Bcast_intra(1021)............:
>> MPIR_Bcast_binomial(187)..........:
>> MPIC_Send(66).....................:
>> MPIC_Wait(528)....................:
>> MPIDI_CH3I_Progress(333)..........:
>> MPID_nem_mpich2_blocking_recv(906):
>> MPID_nem_tcp_connpoll(1861).......: Communication error with rank 1:
>> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
>> ----
>>
>> It looks like a networking issue, but I can't figure out what it is.  I
>> can ssh and rsh from one machine to the other without the need for a
>> password and run commands (e.g, I can run "ssh comp2 hostname" on comp1
>> and vice versa).
>>
>> I seem to be at a dead end.  Any help on this issue will be greatlt
>> appreciated.
>>
>> Saurav
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>     
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>   



More information about the mpich-discuss mailing list