[mpich-discuss] FW: MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused

Pavan Balaji balaji at mcs.anl.gov
Fri Nov 4 09:57:12 CDT 2011


On 11/04/2011 03:57 AM, Miguel Angel Fernández wrote:
> I thought I answered your email,... anyway, I'm doing to much things at
> the same time ;-)

If it was sent to me directly instead of the mpich-discuss mailing list, 
it was probably ignored. Please don't do that.

> Yes, all machines can communicate to each others.

They can communicate as in ssh to each other, or communicate over any port?

> Attached, you have the output of the commands "configure", "make" and
> "make install" for users "mpi" and "root".

It doesn't matter which user you are doing this as, i.e., "mpi" or 
"root". Let's just pick one to avoid confusion. The build seems to have 
gone through fine.

So far, as I understand it, the following works correctly:

% mpiexec -f machinefile hostname

But the following does not:

% mpiexec -f machinefile ./mpi_application

Assuming the above is true, my guess is that there is a firewall issue 
between the nodes. Note that many firewalls allow port 22 to pass 
through which is used for ssh. So you won't notice this with ssh.

> I sent to your personal email "balaji at mcs.anl.gov"one document with the
> configuration I am using. Maybe you can find the thing I am doing wrong.

Please send all emails to the mailing list.

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list