[mpich-discuss] problem with MPI_Get_count() for very long (but legal length) messages.

Barry Smith bsmith at mcs.anl.gov
Fri Feb 5 14:28:40 CST 2010


#include "mpi.h"
#include "stdlib.h"

#undef __FUNCT__
#define __FUNCT__ "main"
int main(int argc,char **argv)
{
   int ierr;
   int    size,rank;
   int            cnt  = 433438806;
   MPI_Status     status;
   long long int  *cols;

   MPI_Init(&argc,&argv);
   ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);
   ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);

   cols = (long long int*) malloc(cnt*sizeof(long long));
   if (rank == 0) {
     ierr = MPI_Send(cols,cnt,MPI_LONG_LONG_INT,1,0,MPI_COMM_WORLD);

   } else {
     ierr = MPI_Recv(cols,cnt,MPI_LONG_LONG_INT, 
0,0,MPI_COMM_WORLD,&status);
     ierr = MPI_Get_count(&status,MPI_LONG_LONG_INT,&cnt);
     printf("count %d\n",cnt);
   }
   ierr = MPI_Finalize();
   return 0;
}

crush is a 64 bit system with 64 bit pointers.

crush:/usr> which mpicc
/soft/apps/packages/mpich2-1.2.1-gcc/bin/mpicc
crush:/usr> which mpiexec
/soft/apps/packages/mpich2-1.2.1-gcc/bin/mpiexec

crush:~> mpicc mpitest.c
mpitest.c: In function ‘main’:
mpitest.c:25: warning: incompatible implicit declaration of built-in  
function ‘printf’
crush:~> mpiexec -n 2 a.out
count -103432106

I've had this problem reported to me by two completely different PETSc  
users so it is a real problem, not just academic. My guess is you  
don't use a long long int in the intermediate computations needed to  
get the final value for count.

To cheer you up, when I run with openMPI it runs forever sucking down  
100% CPU trying to send the messages :-)

   Barry




More information about the mpich-discuss mailing list