[mpich-discuss] Unable to run simple mpi problem
Jayesh Krishna
jayesh at mcs.anl.gov
Tue Dec 15 15:11:39 CST 2009
Hi,
Which version of MPICH2 are you using (Use the latest stable version, 1.2.1, available at http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads)?
Do you get the error when running the MPI program on the local m/c with 3 procs ? Do you get the error if you remove the "-localonly" option ?
Regards,
Jayesh
----- Original Message -----
From: "dave waite" <waitedm at gmail.com>
To: mpich-discuss at mcs.anl.gov
Sent: Tuesday, December 15, 2009 2:40:02 PM GMT -06:00 US/Canada Central
Subject: [mpich-discuss] Unable to run simple mpi problem
We are running mpich2 applications on many Windows platforms. In a few installations, we have a problem where the job dies while initializing mpi. To examine this further, we ran a simple Hellompi program,
// mpi2.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
int master ;
int n_workers ;
MPI_Comm world, workers ;
MPI_Group world_group, worker_group ;
#define BSIZE MPI_MAX_PROCESSOR_NAME
char chrNames[MPI_MAX_PROCESSOR_NAME*64];
int _tmain( int argc, char * argv[])
{
int nprocs=1;
world = MPI_COMM_WORLD;
int iVal=0;
int rank, size, len;
char name[MPI_MAX_PROCESSOR_NAME];
MPI_Status reqstat;
char * p;
int iNodeCnt=1;
SYSTEM_INFO info;
GetSystemInfo( &info );
int i;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Get_processor_name(name, &len);
if (rank==0)
{
// server commands
chrNames[0]=0;
strcat(chrNames, "||" );
strcat(chrNames,name);
strcat(chrNames, "||" );
for (i=1;i<size;i++)
{
MPI_Recv(name,BSIZE,MPI_CHAR,i,999,MPI_COMM_WORLD,&reqstat);
p=strstr(chrNames,name);
if (p==NULL)
{
strcat(chrNames,name);
strcat(chrNames, "||" );
iNodeCnt++;
}
//printf("Hello MPI!\n");
printf( "Hello from Rank %d of %d on %s\n" ,i,size,name);
}
printf( "\nNodes:%d\n" ,iNodeCnt);
printf( "Names:%s\n" ,chrNames);
}
else
{
// client commands
MPI_Send(name,BSIZE,MPI_CHAR,0,999,MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
And noted the same failure. Here is our output,
C:\MPI>mpiexec2 -localonly -n 3 hellompi
unable to read the cmd header on the pmi context, Error = -1
.
[01:4792]......ERROR:result command received but the wait_list is empty.
[01:4792]....ERROR:unable to handle the command: "cmd=result src=1 dest=1 tag=7
cmd_tag=2 cmd_orig=dbput ctx_key=1 result=DBS_SUCCESS "
[01:4792]...ERROR:sock_op_close returned while unknown context is in state: SMPD_IDLE
mpiexec aborting job...
SuspendThread failed with error 5 for process 0:3AB7E6A8-6169-4544-8282-D4D35207
F564:'hellompi'
unable to suspend process.
received suspend command for a pmi context that doesn't exist: unmatched id = 1
unable to read the cmd header on the pmi context, Error = -1
.
Error posting readv, An existing connection was forcibly closed by the remote host.(10054)
received kill command for a pmi context that doesn't exist: unmatched id = 1
unable to read the cmd header on the pmi context, Error = -1
.
Error posting readv, An existing connection was forcibly closed by the remote ho
st.(10054)
job aborted:
rank: node: exit code[: error message]
0: usbospc126.americas.munters.com: 123: process 0 exited without calling finalize
1: usbospc126.americas.munters.com: 123: process 1 exited without calling finalize
2: usbospc126.americas.munters.com: 123
Fatal error in MPI_Finalize: Invalid communicator, error stack:
MPI_Finalize(307): MPI_Finalize failed
MPI_Finalize(198):
MPID_Finalize(92):
PMPI_Barrier(476): MPI_Barrier(comm=0x44000002) failed
PMPI_Barrier(396): Invalid communicator
[0] unable to post a write of the abort command.
This was run on a dual-core machine, running Windows XP, SP2. What do these error messages tell us?
What is the best way to proceed in debugging this kind of issue?
Thanks,
Dave Waite
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list