[mpich-discuss] Connection refused with 3 processes, no issue with 2 processes.

Dave Goodell goodell at mcs.anl.gov
Mon Jun 11 11:33:15 CDT 2012


This is almost certainly a firewall or network configuration issue.  I would start by disabling any firewall while debugging and then reenabling it later in order to figure out which rules should be altered.

What does your hostfile look like?  How are you invoking mpiexec?  Depending on the answers to these questions, it's entirely reasonable for you to have a problem with 3 processes but not 2.

-Dave

On Jun 11, 2012, at 9:37 AM CDT, BOUVIER Benjamin wrote:

> Hi everybody,
> 
> I'm a new user of MPICH2 and I experiment some issue on a very simple program. The idea is that when I launch the program locally, there's no problem. When I launch the program on two network-connected machines, there's no problem. But when I launch it on three network-connected machines, there is the issue : "Connection refused".
> The three machines are correctly connected, I can connect over SSH from one to another with success.
> 
> I've tried to use OpenMPI instead of MPICH2 firstly, but running the program with OMPI blocks with 2 different machines. The fact that it blocks with 2 different MPI implementations lets me think that's it's not a library bug but maybe a network or system local issue.
> 
> Here is the sample program :
> 
> # include <mpi.h>
> # include <stdio.h>
> # include <string.h>
> 
> int main(int argc, char **argv)
> {
>    int rank, size;
>    const char someString[] = "Can haz cheezburgerz?";
> 
>    MPI_Init(&argc, &argv);
> 
>    MPI_Comm_rank( MPI_COMM_WORLD, & rank );
>    MPI_Comm_size( MPI_COMM_WORLD, & size );
> 
>    if ( rank == 0 )
>    {
>        int n = 42;
>        int i;
>        for( i = 1; i < size; ++i)
>        {
>            MPI_Send( &n, 1, MPI_INT, i, 0, MPI_COMM_WORLD );
>            MPI_Send( &someString, strlen( someString )+1, MPI_CHAR, i, 0, MPI_COMM_WORLD );
>        }
>    } else {
>        char buffer[ 128 ];
>        int received;
>        MPI_Status stat;
>        MPI_Recv( &received, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat );
>        printf( "[Worker] Number : %d\n", received );
>        MPI_Recv( buffer, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat );
>        printf( "[Worker] String : %s\n", buffer );
>    }
> 
>    MPI_Finalize();
> }
> 
> When I launch the program on 3 machines connected by network, here is the message I get :
> 
> [Worker] Number : 42
> [Worker] String : Can haz cheezburgerz?
> Fatal error in MPI_Send: Other MPI error, error stack:
> MPI_Send(173)..............: MPI_Send(buf=0x7fff6c4bac2c, count=1, MPI_INT, dest=2, tag=0, MPI_COMM_WORLD) failed
> MPID_nem_tcp_connpoll(1826): Communication error with rank 2: Connection refused
> 
> Launching the program on 2 machines doesn't show any particular issue.
> 
> I'm using MPICH2 version 1.4.1p1, locally compiled.
> $ uname -a
> Linux trtp7097 2.6.32-220.13.1.el6.x86_64 #1 SMP Thu Mar 29 11:46:40 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
> $ cat /etc/redhat-release
> Red Hat Enterprise Linux Workstation release 6.2 (Santiago) 
> 
> I saw on another topic that it could be a firewall issue. It seems astonishing as if there was a rule for connection blocking, it would apply also when there are 2 processes.
> Does anybody has any idea ?
> Thanks in advance for your help,
> --
> Benjamin Bouvier
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list