[mpich-discuss] Unable to run simple mpi problem

dave waite waitedm at gmail.com
Tue Dec 15 14:40:02 CST 2009


We are running mpich2 applications on many Windows platforms.  In a few
installations, we have a problem where the job dies while initializing mpi.
To examine this further, we ran a simple Hellompi program,

 

// mpi2.cpp : Defines the entry point for the console application.

//

 

#include "stdafx.h"

 

int master ;

int n_workers ;

MPI_Comm world, workers ;

MPI_Group world_group, worker_group ;

#define BSIZE MPI_MAX_PROCESSOR_NAME

 

char chrNames[MPI_MAX_PROCESSOR_NAME*64];

 

int _tmain(int argc, char* argv[])

{

      int nprocs=1;

 

      world = MPI_COMM_WORLD;

      int iVal=0;

      int         rank, size, len;

      char            name[MPI_MAX_PROCESSOR_NAME];

   MPI_Status reqstat;

   char* p;

   int iNodeCnt=1;

 

   SYSTEM_INFO info;

   GetSystemInfo( &info );

   

   int i;

 

      MPI_Init(&argc, &argv);

      MPI_Comm_rank(MPI_COMM_WORLD, &rank);

      MPI_Comm_size(MPI_COMM_WORLD, &size);

      

      MPI_Get_processor_name(name, &len);

 

      if (rank==0) 

   {

     // server commands

      chrNames[0]=0;

      strcat(chrNames,"||");

      strcat(chrNames,name);

      strcat(chrNames,"||");

 

            for (i=1;i<size;i++) 

      {

 
MPI_Recv(name,BSIZE,MPI_CHAR,i,999,MPI_COMM_WORLD,&reqstat);

         p=strstr(chrNames,name);

         if(p==NULL)

         {

            strcat(chrNames,name);

            strcat(chrNames,"||");

            iNodeCnt++;

         }

 

                  //printf("Hello MPI!\n");

                  printf("Hello from Rank %d of %d on %s\n",i,size,name);

            }

      printf("\nNodes:%d\n",iNodeCnt);

      printf("Names:%s\n",chrNames);

      } 

   else 

   {

            // client commands

            MPI_Send(name,BSIZE,MPI_CHAR,0,999,MPI_COMM_WORLD);

      }

 

   MPI_Finalize();

      return 0;

}

 

And noted the same failure.  Here is our output,

 

C:\MPI>mpiexec2 -localonly -n 3 hellompi

unable to read the cmd header on the pmi context, Error = -1

.

[01:4792]......ERROR:result command received but the wait_list is empty.

[01:4792]....ERROR:unable to handle the command: "cmd=result src=1 dest=1
tag=7

cmd_tag=2 cmd_orig=dbput ctx_key=1 result=DBS_SUCCESS "

[01:4792]...ERROR:sock_op_close returned while unknown context is in state:
SMPD_IDLE

mpiexec aborting job...

SuspendThread failed with error 5 for process
0:3AB7E6A8-6169-4544-8282-D4D35207

F564:'hellompi'

unable to suspend process.

received suspend command for a pmi context that doesn't exist: unmatched id
= 1

unable to read the cmd header on the pmi context, Error = -1

.

Error posting readv, An existing connection was forcibly closed by the
remote host.(10054)

received kill command for a pmi context that doesn't exist: unmatched id = 1

unable to read the cmd header on the pmi context, Error = -1

.

Error posting readv, An existing connection was forcibly closed by the
remote ho

st.(10054)

 

job aborted:

rank: node: exit code[: error message]

0: usbospc126.americas.munters.com: 123: process 0 exited without calling
finalize

1: usbospc126.americas.munters.com: 123: process 1 exited without calling
finalize

2: usbospc126.americas.munters.com: 123

Fatal error in MPI_Finalize: Invalid communicator, error stack:

MPI_Finalize(307): MPI_Finalize failed

MPI_Finalize(198):

MPID_Finalize(92):

PMPI_Barrier(476): MPI_Barrier(comm=0x44000002) failed

PMPI_Barrier(396): Invalid communicator

[0] unable to post a write of the abort command.

 

This was run on a dual-core machine, running Windows XP, SP2.   What do
these error messages tell us?

What is the best way to proceed in debugging this kind of issue?

 

Thanks,

 

Dave Waite

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20091215/7e4a1220/attachment.htm>


More information about the mpich-discuss mailing list