[MPICH] MPICH Connection Problem

Anthony Chan chan at mcs.anl.gov
Thu Mar 15 11:40:36 CDT 2007



On Thu, 15 Mar 2007, Natarajan, Senthil wrote:

> Hi,
>
> I am using MPICH1.2.4 on Linux. I installed with the option -rsh=ssh.

MPICH-1.2.4 is very old.  I strongly recommend you to switch to the latest
MPICH2 if possible (i.e. if you are not using condor)  If you are using
condor to run mpi job, you may want to consult with condor emailing list
as well.

>
> After successfully installed, I am trying to run a simple mpi job with
> the two machines.
>
> I have generated the key pair (ssh-keygen) and copied to other machine,
> and I can ssh between the machines with out password.
>
> mpirun -v -np 2 -machinefile machines tspRunOneBranch randomOut10.txt
>
> running /home/condor-nobody/teststuff/tspRunOneBranch on 2 LINUX ch_p4
> processors
>
> Created /home/condor-nobody/teststuff/PI24892
>
> connect to address xxx.xx.xxx.95: Connection refused
>
> Trying krb4 rsh...
>
> connect to address xxx.xx.xxx.95: Connection refused
>
> trying normal rsh (/usr/bin/rsh)
>
> machine2: Connection refused
>
> p0_24976:  p4_error: Timeout in making connection to remote process on
> machine2: 0
>

Your machine may have firewall running that blocks mpi job (if you are
root and your machine is using iptables, you can use "iptables -L" to
check) or your network is not setup correctly.  If you are using MPICH2
with the default process manager, mpd, you can use mpdcheck as described
in the MPICH2 install and user's guide.

A.Chan




More information about the mpich-discuss mailing list