[mpich-discuss] MPI_Send Error

Deniz DAL dendal25 at yahoo.com
Wed Feb 2 05:08:44 CST 2011


Hello everyone,
I created a small linux cluster with 4 compute nodes yesterday. I installed Fedora 14 OS to all machine and mpich2 v1.3.2 on the server node. But i have a serious problem. I can not even make a simple send and receive program work. below is the steps that i follow. You will see the error at the end. I can not run point to point and collective communication routines of mpi. I am suspecting that the problem might have sthg to do with the hydra. Any help is appreciated.
Deniz.
 
 
[ddal at admin mpi_uygulamalar]$ cat hosts 
admin
cn01
cn02
cn03
[ddal at admin mpi_uygulamalar]$ cat 01_Send_Receive_One_Message.cpp 
#include "mpi.h"
#include <iostream>
using namespace std;
#define TAG 25
int main(int argc, char* argv[])
{
 int myRank,
         size;
 char processorName[50];
 int nameLength;
 int a;//size of the send buffer
 int b;
 MPI_Status status;
 /* Initialize MPI */
 MPI_Init(&argc, &argv);
 /* Determine the size of the group */
 MPI_Comm_size(MPI_COMM_WORLD,&size);
 /* Determine the rank of the calling process */
 MPI_Comm_rank(MPI_COMM_WORLD,&myRank);
 MPI_Get_processor_name(processorName, &nameLength);
 if(size != 2 )
 {
  cout<<"Number of CPUs must be 2 !\n";
  MPI_Abort(MPI_COMM_WORLD, 99);
 }
 if(myRank == 0)/* Master Sends a Message */
 {
  a=25;
  MPI_Send(&a, 1, MPI_INT, 1, TAG, MPI_COMM_WORLD);
  printf("%s Sent Variable a Successfully\n",processorName);
 }
 else /* Process 1 Receives the Message */
 {
  MPI_Recv(&b, 1, MPI_INT, 0, TAG, MPI_COMM_WORLD, &status );
  printf("%s Received Variable a Successfully over b\n",processorName);
  printf("b=%d\n",b);
 }
 /* Terminate the MPI */
 MPI_Finalize();
 return 0;
}[ddal at admin mpi_uygulamalar]$ mpicxx 01_Send_Receive_One_Message.cpp -o test.x 
[ddal at admin mpi_uygulamalar]$ mpiexec -f hosts -n 2 ./test.x 
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(173)..............: MPI_Send(buf=0xbf928900, count=1, MPI_INT, dest=1, tag=25, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1811): Communication error with rank 1: 
[mpiexec at admin] ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP
[proxy:0:1 at cn01] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed
[proxy:0:1 at cn01] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1 at cn01] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
[ddal at admin mpi_uygulamalar]$ 


      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110202/8a57f6ba/attachment.htm>


More information about the mpich-discuss mailing list