<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style>
</head>
<body class='hmmessage'><div dir='ltr'>
<div><p class="MsoNormal" style="font-size: 10pt; "><span class="apple-style-span"><span lang="EN-GB" style="font-size: 11.5pt; ">Dear
Pavan</span></span><span lang="EN-GB" style="font-size: 11.5pt; "><br>
<br>
<span class="apple-style-span"><span style="float: none; ">All machines can communicate to each others using
ssh.</span></span></span><span lang="EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal" style="font-size: 10pt; "><span lang="EN-GB" style="font-size: 11.5pt; "><o:p> </o:p></span></p>
<p class="MsoNormal" style="font-size: 10pt; "><span lang="EN-GB" style="font-size: 11.5pt; ">The reason of doing the configuration
using root instead of any other user was the problem I found using
"mpi" user. I thought that It could be caused by a permission error or
something similar. I normally use "mpi" user.<o:p></o:p></span></p><p class="MsoNormal" style="font-size: 10pt; "><span lang="EN-GB" style="font-size: 11.5pt; "><br></span></p><br><div style="font-size: 10pt; ">> Date: Fri, 4 Nov 2011 09:57:12 -0500<br>> From: balaji@mcs.anl.gov<br>> To: mafga74@hotmail.com<br>> CC: mpich-discuss@mcs.anl.gov<br>> Subject: Re: FW: [mpich-discuss] MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused<br>> <br>> <br>> On 11/04/2011 03:57 AM, Miguel Angel Fernández wrote:<br>> > I thought I answered your email,... anyway, I'm doing to much things at<br>> > the same time ;-)<br>> <br>> If it was sent to me directly instead of the mpich-discuss mailing list, <br>> it was probably ignored. Please don't do that.<br>> <br>> > Yes, all machines can communicate to each others.<br>> <br>> They can communicate as in ssh to each other, or communicate over any port?<br>> <br>> > Attached, you have the output of the commands "configure", "make" and<br>> > "make install" for users "mpi" and "root".<br>> <br>> It doesn't matter which user you are doing this as, i.e., "mpi" or <br>> "root". Let's just pick one to avoid confusion. The build seems to have <br>> gone through fine.<br>> <br>> So far, as I understand it, the following works correctly:<br>> <br>> % mpiexec -f machinefile hostname<br>> <br>> But the following does not:<br>> <br>> % mpiexec -f machinefile ./mpi_application<br>> </div><div style="font-size: 10pt; ">> Assuming the above is true, my guess is that there is a firewall issue </div><div style="font-size: 10pt; ">> between the nodes. Note that many firewalls allow port 22 to pass <br>> through which is used for ssh. So you won't notice this with ssh.<br>> <br><br></div><div style="font-size: 10pt; "><span class="Apple-style-span" style="font-size: 15px; ">Your assumption is right.</span></div><div style="font-size: 10pt; "><span class="Apple-style-span" style="font-size: 15px; "><br></span></div><div><div>
<p class="MsoNormal" style="font-size: 10pt; "><span lang="EN-GB" style="font-size: 11.5pt; ">I was looking for any FW on each
machine,... in fact I disabled de "ufw" on all machines (one of your colleges
told me about that) and this is the unique FW that is installed automatically
on Ubuntu/Debian.<o:p></o:p></span></p>
<p class="MsoNormal" style="font-size: 10pt; "><span lang="EN-GB" style="font-size: 11.5pt; ">At this moment the status of ufw is
"disable" on all machines and the problem is still there.<o:p></o:p></span></p><p class="MsoNormal" style="font-size: 10pt; "><span lang="EN-GB" style="font-size: 11.5pt; "><br></span></p></div></div><div style="font-size: 10pt; ">> > I sent to your personal email "balaji@mcs.anl.gov"one document with the<br>> > configuration I am using. Maybe you can find the thing I am doing wrong.<br>> <br><br></div><div style="font-size: 10pt; "><p class="MsoNormal"><span lang="EN-GB" style="font-size: 10pt; ">I don't want to bother you but it is difficult
enough to install and configure a MPI cluster without the wrong info content on
the document. When the procedure to install and configure will be correct and proved I
will share it to the list. That is the reason I sent to you directly this
document. I apologise for it.<o:p></o:p></span></p><p class="MsoNormal"><br></p><p class="MsoNormal">Please, can you help me to find what is the problem?, I have to recognize that I tried all ideas I had with no better results.</p><p class="MsoNormal"><br></p></div><div style="font-size: 10pt; ">> Please send all emails to the mailing list.<br>> <br>> -- <br>> Pavan Balaji<br>> http://www.mcs.anl.gov/~balaji<br></div></div><div style="font-size: 10pt; "><br></div><div style="font-size: 10pt; ">Best regards</div><div style="font-size: 10pt; ">Miguel Angel Fernandez (PhD student)</div><div style="font-size: 10pt; ">Polytechnic University of Madrid (Spain)</div><div style="font-size: 10pt; "><br></div><div style="font-size: 10pt; "><br></div>                                            </div></body>
</html>