[mpich-discuss] problem with MPI_Get_count() for very long (but legal length) messages.

Barry Smith bsmith at mcs.anl.gov
Fri Feb 5 14:40:10 CST 2010


   Rusty,

     Look at the code. There are no 64 bit integers! The cnt is  
433,438,806 which is completely representable in plan old 32 bit ints.  
All the ints passed to MPI are within legal limits.
In fact, I believe you actually send the message correctly. You only  
give the wrong answer for count. It is simply because the sizeof the  
datatype times the number of entries being sent is so large that the  
problem occurs.

    Barry

The same problem comes up in sending doubles, I just cut this code  
from where we sent long long int.



On Feb 5, 2010, at 2:34 PM, Rusty Lusk wrote:

> 64-bit integers too?  by default?
>
> On Friday,Feb 5, 2010, at 2:28 PM, Barry Smith wrote:
>
>>
>> #include "mpi.h"
>> #include "stdlib.h"
>>
>> #undef __FUNCT__
>> #define __FUNCT__ "main"
>> int main(int argc,char **argv)
>> {
>> int ierr;
>> int    size,rank;
>> int            cnt  = 433438806;
>> MPI_Status     status;
>> long long int  *cols;
>>
>> MPI_Init(&argc,&argv);
>> ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);
>> ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>>
>> cols = (long long int*) malloc(cnt*sizeof(long long));
>> if (rank == 0) {
>>   ierr = MPI_Send(cols,cnt,MPI_LONG_LONG_INT,1,0,MPI_COMM_WORLD);
>>
>> } else {
>>   ierr = MPI_Recv(cols,cnt,MPI_LONG_LONG_INT, 
>> 0,0,MPI_COMM_WORLD,&status);
>>   ierr = MPI_Get_count(&status,MPI_LONG_LONG_INT,&cnt);
>>   printf("count %d\n",cnt);
>> }
>> ierr = MPI_Finalize();
>> return 0;
>> }
>>
>> crush is a 64 bit system with 64 bit pointers.
>>
>> crush:/usr> which mpicc
>> /soft/apps/packages/mpich2-1.2.1-gcc/bin/mpicc
>> crush:/usr> which mpiexec
>> /soft/apps/packages/mpich2-1.2.1-gcc/bin/mpiexec
>>
>> crush:~> mpicc mpitest.c
>> mpitest.c: In function ‘main’:
>> mpitest.c:25: warning: incompatible implicit declaration of built- 
>> in function ‘printf’
>> crush:~> mpiexec -n 2 a.out
>> count -103432106
>>
>> I've had this problem reported to me by two completely different  
>> PETSc users so it is a real problem, not just academic. My guess is  
>> you don't use a long long int in the intermediate computations  
>> needed to get the final value for count.
>>
>> To cheer you up, when I run with openMPI it runs forever sucking  
>> down 100% CPU trying to send the messages :-)
>>
>> Barry
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>



More information about the mpich-discuss mailing list