[mpich-discuss] Connection refused with 3 processes, no issue with 2 processes.
BOUVIER Benjamin
benjamin.bouvier at thalesgroup.com
Mon Jun 11 09:37:30 CDT 2012
Hi everybody,
I'm a new user of MPICH2 and I experiment some issue on a very simple program. The idea is that when I launch the program locally, there's no problem. When I launch the program on two network-connected machines, there's no problem. But when I launch it on three network-connected machines, there is the issue : "Connection refused".
The three machines are correctly connected, I can connect over SSH from one to another with success.
I've tried to use OpenMPI instead of MPICH2 firstly, but running the program with OMPI blocks with 2 different machines. The fact that it blocks with 2 different MPI implementations lets me think that's it's not a library bug but maybe a network or system local issue.
Here is the sample program :
# include <mpi.h>
# include <stdio.h>
# include <string.h>
int main(int argc, char **argv)
{
int rank, size;
const char someString[] = "Can haz cheezburgerz?";
MPI_Init(&argc, &argv);
MPI_Comm_rank( MPI_COMM_WORLD, & rank );
MPI_Comm_size( MPI_COMM_WORLD, & size );
if ( rank == 0 )
{
int n = 42;
int i;
for( i = 1; i < size; ++i)
{
MPI_Send( &n, 1, MPI_INT, i, 0, MPI_COMM_WORLD );
MPI_Send( &someString, strlen( someString )+1, MPI_CHAR, i, 0, MPI_COMM_WORLD );
}
} else {
char buffer[ 128 ];
int received;
MPI_Status stat;
MPI_Recv( &received, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat );
printf( "[Worker] Number : %d\n", received );
MPI_Recv( buffer, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat );
printf( "[Worker] String : %s\n", buffer );
}
MPI_Finalize();
}
When I launch the program on 3 machines connected by network, here is the message I get :
[Worker] Number : 42
[Worker] String : Can haz cheezburgerz?
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(173)..............: MPI_Send(buf=0x7fff6c4bac2c, count=1, MPI_INT, dest=2, tag=0, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1826): Communication error with rank 2: Connection refused
Launching the program on 2 machines doesn't show any particular issue.
I'm using MPICH2 version 1.4.1p1, locally compiled.
$ uname -a
Linux trtp7097 2.6.32-220.13.1.el6.x86_64 #1 SMP Thu Mar 29 11:46:40 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/redhat-release
Red Hat Enterprise Linux Workstation release 6.2 (Santiago)
I saw on another topic that it could be a firewall issue. It seems astonishing as if there was a rule for connection blocking, it would apply also when there are 2 processes.
Does anybody has any idea ?
Thanks in advance for your help,
--
Benjamin Bouvier
More information about the mpich-discuss
mailing list