[MPICH] MPICH Connection Problem

Natarajan, Senthil senthil at pitt.edu
Thu Mar 15 12:50:30 CDT 2007


Hi Jan,
Thanks for Info.
I tried configure with  -rsh=RSHCOMMAND then -rsh=ssh and with
environmental variable P4_RSHCOMMAND, RSHCOMMAND. 

And all the combinations which you suggested but nothing seems to be
working still I am having connection refused problem.

mpirun -v -np 2 -machinefile machines tspRunOneBranch randomOut10.txt
running /home/condor-nobody/teststuff/tspRunOneBranch on 2 LINUX ch_p4
processors
Created /home/condor-nobody/teststuff/PI20605
connect to address xxx.xx.xxx.95: Connection refused
Trying krb4 rsh...
connect to address xxx.xx.xxx.95: Connection refused
trying normal rsh (/usr/bin/rsh)
machine2: Connection refused
p0_20689:  p4_error: Timeout in making connection to remote process on
machine2: 0

The problem is, it is not even contacting the other machine (I am
watching the network activity on other machine) but it says connection
refused to other machine. I am not sure why it is not using ssh, even
though I configured with the option, compiled and installed. Even trying
with the above environmental variables to set ssh.

I have the iptables on, but I am not seeing any connection drop between
the two machines in the system log.

Thanks,
Senthil





-----Original Message-----
From: Jan Wagner [mailto:jwagner at kurp.hut.fi] 
Sent: Thursday, March 15, 2007 12:37 PM
To: Natarajan, Senthil
Cc: mpich-discuss at mcs.anl.gov; ashton at mcs.anl.gov
Subject: Re: [MPICH] MPICH Connection Problem

Hi,

On Thu, 15 Mar 2007, Natarajan, Senthil wrote:
> I am using MPICH1.2.4 on Linux. I installed with the option -rsh=ssh.
>
> After successfully installed, I am trying to run a simple mpi job with
> the two machines.
>
> I have generated the key pair (ssh-keygen) and copied to other
machine,
> and I can ssh between the machines with out password.
>
> I am trying to run a simple mpi job, but it with out trying to connect

> other machine, complains about connection refused.

Just a thought, but if you check with e.g. 'ps ax' what processes are 
started, you could see with what ssh parameters mpich tries to execute
the 
remote programs.

But ok, at least from your mpich output it looks like it is still trying

to use old rsh instead of ssh.

Try setting
$ export P4_RSHCOMMAND=ssh
$ export RSHCOMMAND=ssh

and do the mpirun again. Then it should use ssh. If not, try configuring

and compiling again, this time with -rsh=RSHCOMMAND.

(The 1.2.5 and 1.2.7 configure/compile is a bit strange, when I 
too compiled with -rsh=ssh a few days ago it did not want to use ssh. 
Compiling with -rsh=RSHCOMMAND complained to me near the compile end
that 
I should not use this "old" option, but use -rsh=ssh instead. But 
the complained about RSHCOMMAND works! In contrast to the "new" option. 
Odd. Well go figure... ;-) )

Oh and also note that you'd probably need to stop iptables/ipchains, or 
configure them properly, as with ssh mpich tries to set up some ssh port

forwarding / port tunneling. Connections will time out.

  - Jan




More information about the mpich-discuss mailing list