[mpich-discuss] Connection refused with 3 processes, no issue with 2 processes.

BOUVIER Benjamin benjamin.bouvier at thalesgroup.com
Mon Jun 11 09:37:30 CDT 2012


Hi everybody,

I'm a new user of MPICH2 and I experiment some issue on a very simple program. The idea is that when I launch the program locally, there's no problem. When I launch the program on two network-connected machines, there's no problem. But when I launch it on three network-connected machines, there is the issue : "Connection refused".
The three machines are correctly connected, I can connect over SSH from one to another with success.

I've tried to use OpenMPI instead of MPICH2 firstly, but running the program with OMPI blocks with 2 different machines. The fact that it blocks with 2 different MPI implementations lets me think that's it's not a library bug but maybe a network or system local issue.

Here is the sample program :

# include <mpi.h>
# include <stdio.h>
# include <string.h>

int main(int argc, char **argv)
{
    int rank, size;
    const char someString[] = "Can haz cheezburgerz?";

    MPI_Init(&argc, &argv);

    MPI_Comm_rank( MPI_COMM_WORLD, & rank );
    MPI_Comm_size( MPI_COMM_WORLD, & size );

    if ( rank == 0 )
    {
        int n = 42;
        int i;
        for( i = 1; i < size; ++i)
        {
            MPI_Send( &n, 1, MPI_INT, i, 0, MPI_COMM_WORLD );
            MPI_Send( &someString, strlen( someString )+1, MPI_CHAR, i, 0, MPI_COMM_WORLD );
        }
    } else {
        char buffer[ 128 ];
        int received;
        MPI_Status stat;
        MPI_Recv( &received, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat );
        printf( "[Worker] Number : %d\n", received );
        MPI_Recv( buffer, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat );
        printf( "[Worker] String : %s\n", buffer );
    }

    MPI_Finalize();
}

When I launch the program on 3 machines connected by network, here is the message I get :

[Worker] Number : 42
[Worker] String : Can haz cheezburgerz?
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(173)..............: MPI_Send(buf=0x7fff6c4bac2c, count=1, MPI_INT, dest=2, tag=0, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1826): Communication error with rank 2: Connection refused

Launching the program on 2 machines doesn't show any particular issue.

I'm using MPICH2 version 1.4.1p1, locally compiled.
$ uname -a
Linux trtp7097 2.6.32-220.13.1.el6.x86_64 #1 SMP Thu Mar 29 11:46:40 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/redhat-release
Red Hat Enterprise Linux Workstation release 6.2 (Santiago) 

I saw on another topic that it could be a firewall issue. It seems astonishing as if there was a rule for connection blocking, it would apply also when there are 2 processes.
Does anybody has any idea ?
Thanks in advance for your help,
--
Benjamin Bouvier


More information about the mpich-discuss mailing list