[mpich-discuss] MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused

Miguel Angel Fernández mafga74 at hotmail.com
Fri Nov 4 10:42:04 CDT 2011


Dear
Pavan



All machines can communicate to each others using
ssh.

 

The reason of doing the configuration
using root instead of any other user was the problem I found using
"mpi" user. I thought that It could be caused by a permission error or
something similar. I normally use "mpi" user.

> Date: Fri, 4 Nov 2011 09:57:12 -0500
> From: balaji at mcs.anl.gov
> To: mafga74 at hotmail.com
> CC: mpich-discuss at mcs.anl.gov
> Subject: Re: FW: [mpich-discuss] MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 
> 
> On 11/04/2011 03:57 AM, Miguel Angel Fernández wrote:
> > I thought I answered your email,... anyway, I'm doing to much things at
> > the same time ;-)
> 
> If it was sent to me directly instead of the mpich-discuss mailing list, 
> it was probably ignored. Please don't do that.
> 
> > Yes, all machines can communicate to each others.
> 
> They can communicate as in ssh to each other, or communicate over any port?
> 
> > Attached, you have the output of the commands "configure", "make" and
> > "make install" for users "mpi" and "root".
> 
> It doesn't matter which user you are doing this as, i.e., "mpi" or 
> "root". Let's just pick one to avoid confusion. The build seems to have 
> gone through fine.
> 
> So far, as I understand it, the following works correctly:
> 
> % mpiexec -f machinefile hostname
> 
> But the following does not:
> 
> % mpiexec -f machinefile ./mpi_application
> > Assuming the above is true, my guess is that there is a firewall issue > between the nodes. Note that many firewalls allow port 22 to pass 
> through which is used for ssh. So you won't notice this with ssh.
> 

Your  assumption is right.


I was looking for any FW on each
machine,... in fact I disabled de "ufw" on all machines (one of your colleges
told me about that) and this is the unique FW that is installed automatically
on Ubuntu/Debian.

At this moment the status of ufw is
"disable" on all machines and the problem is still there.
> > I sent to your personal email "balaji at mcs.anl.gov"one document with the
> > configuration I am using. Maybe you can find the thing I am doing wrong.
> 

I don't want to bother you but it is difficult
enough to install and configure a MPI cluster without the wrong info content on
the document. When the procedure to install and configure will be correct and proved I
will share it to the list. That is the reason I sent to you directly this
document. I apologise for it.
Please, can you help me to find what is the problem?, I have to recognize that I tried all ideas I had with no better results.
> Please send all emails to the mailing list.
> 
> -- 
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji

Best regardsMiguel Angel Fernandez (PhD student)Polytechnic University of Madrid (Spain)

 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111104/11e6a924/attachment.htm>


More information about the mpich-discuss mailing list