[mpich-discuss] problem with MPI_Get_count() for very long (but legal length) messages.
Barry Smith
bsmith at mcs.anl.gov
Fri Feb 5 14:28:40 CST 2010
#include "mpi.h"
#include "stdlib.h"
#undef __FUNCT__
#define __FUNCT__ "main"
int main(int argc,char **argv)
{
int ierr;
int size,rank;
int cnt = 433438806;
MPI_Status status;
long long int *cols;
MPI_Init(&argc,&argv);
ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);
ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);
cols = (long long int*) malloc(cnt*sizeof(long long));
if (rank == 0) {
ierr = MPI_Send(cols,cnt,MPI_LONG_LONG_INT,1,0,MPI_COMM_WORLD);
} else {
ierr = MPI_Recv(cols,cnt,MPI_LONG_LONG_INT,
0,0,MPI_COMM_WORLD,&status);
ierr = MPI_Get_count(&status,MPI_LONG_LONG_INT,&cnt);
printf("count %d\n",cnt);
}
ierr = MPI_Finalize();
return 0;
}
crush is a 64 bit system with 64 bit pointers.
crush:/usr> which mpicc
/soft/apps/packages/mpich2-1.2.1-gcc/bin/mpicc
crush:/usr> which mpiexec
/soft/apps/packages/mpich2-1.2.1-gcc/bin/mpiexec
crush:~> mpicc mpitest.c
mpitest.c: In function ‘main’:
mpitest.c:25: warning: incompatible implicit declaration of built-in
function ‘printf’
crush:~> mpiexec -n 2 a.out
count -103432106
I've had this problem reported to me by two completely different PETSc
users so it is a real problem, not just academic. My guess is you
don't use a long long int in the intermediate computations needed to
get the final value for count.
To cheer you up, when I run with openMPI it runs forever sucking down
100% CPU trying to send the messages :-)
Barry
More information about the mpich-discuss
mailing list