[mpich-discuss] Unable to run simple mpi problem
    dave waite 
    waitedm at gmail.com
       
    Tue Dec 15 14:40:02 CST 2009
    
    
  
We are running mpich2 applications on many Windows platforms.  In a few
installations, we have a problem where the job dies while initializing mpi.
To examine this further, we ran a simple Hellompi program,
 
// mpi2.cpp : Defines the entry point for the console application.
//
 
#include "stdafx.h"
 
int master ;
int n_workers ;
MPI_Comm world, workers ;
MPI_Group world_group, worker_group ;
#define BSIZE MPI_MAX_PROCESSOR_NAME
 
char chrNames[MPI_MAX_PROCESSOR_NAME*64];
 
int _tmain(int argc, char* argv[])
{
      int nprocs=1;
 
      world = MPI_COMM_WORLD;
      int iVal=0;
      int         rank, size, len;
      char            name[MPI_MAX_PROCESSOR_NAME];
   MPI_Status reqstat;
   char* p;
   int iNodeCnt=1;
 
   SYSTEM_INFO info;
   GetSystemInfo( &info );
   
   int i;
 
      MPI_Init(&argc, &argv);
      MPI_Comm_rank(MPI_COMM_WORLD, &rank);
      MPI_Comm_size(MPI_COMM_WORLD, &size);
      
      MPI_Get_processor_name(name, &len);
 
      if (rank==0) 
   {
     // server commands
      chrNames[0]=0;
      strcat(chrNames,"||");
      strcat(chrNames,name);
      strcat(chrNames,"||");
 
            for (i=1;i<size;i++) 
      {
 
MPI_Recv(name,BSIZE,MPI_CHAR,i,999,MPI_COMM_WORLD,&reqstat);
         p=strstr(chrNames,name);
         if(p==NULL)
         {
            strcat(chrNames,name);
            strcat(chrNames,"||");
            iNodeCnt++;
         }
 
                  //printf("Hello MPI!\n");
                  printf("Hello from Rank %d of %d on %s\n",i,size,name);
            }
      printf("\nNodes:%d\n",iNodeCnt);
      printf("Names:%s\n",chrNames);
      } 
   else 
   {
            // client commands
            MPI_Send(name,BSIZE,MPI_CHAR,0,999,MPI_COMM_WORLD);
      }
 
   MPI_Finalize();
      return 0;
}
 
And noted the same failure.  Here is our output,
 
C:\MPI>mpiexec2 -localonly -n 3 hellompi
unable to read the cmd header on the pmi context, Error = -1
.
[01:4792]......ERROR:result command received but the wait_list is empty.
[01:4792]....ERROR:unable to handle the command: "cmd=result src=1 dest=1
tag=7
cmd_tag=2 cmd_orig=dbput ctx_key=1 result=DBS_SUCCESS "
[01:4792]...ERROR:sock_op_close returned while unknown context is in state:
SMPD_IDLE
mpiexec aborting job...
SuspendThread failed with error 5 for process
0:3AB7E6A8-6169-4544-8282-D4D35207
F564:'hellompi'
unable to suspend process.
received suspend command for a pmi context that doesn't exist: unmatched id
= 1
unable to read the cmd header on the pmi context, Error = -1
.
Error posting readv, An existing connection was forcibly closed by the
remote host.(10054)
received kill command for a pmi context that doesn't exist: unmatched id = 1
unable to read the cmd header on the pmi context, Error = -1
.
Error posting readv, An existing connection was forcibly closed by the
remote ho
st.(10054)
 
job aborted:
rank: node: exit code[: error message]
0: usbospc126.americas.munters.com: 123: process 0 exited without calling
finalize
1: usbospc126.americas.munters.com: 123: process 1 exited without calling
finalize
2: usbospc126.americas.munters.com: 123
Fatal error in MPI_Finalize: Invalid communicator, error stack:
MPI_Finalize(307): MPI_Finalize failed
MPI_Finalize(198):
MPID_Finalize(92):
PMPI_Barrier(476): MPI_Barrier(comm=0x44000002) failed
PMPI_Barrier(396): Invalid communicator
[0] unable to post a write of the abort command.
 
This was run on a dual-core machine, running Windows XP, SP2.   What do
these error messages tell us?
What is the best way to proceed in debugging this kind of issue?
 
Thanks,
 
Dave Waite
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20091215/7e4a1220/attachment.htm>
    
    
More information about the mpich-discuss
mailing list