[mpich-discuss] problem with MPI_Get_count() for very long (but legal length) messages.
Barry Smith
bsmith at mcs.anl.gov
Fri Feb 5 14:40:10 CST 2010
Rusty,
Look at the code. There are no 64 bit integers! The cnt is
433,438,806 which is completely representable in plan old 32 bit ints.
All the ints passed to MPI are within legal limits.
In fact, I believe you actually send the message correctly. You only
give the wrong answer for count. It is simply because the sizeof the
datatype times the number of entries being sent is so large that the
problem occurs.
Barry
The same problem comes up in sending doubles, I just cut this code
from where we sent long long int.
On Feb 5, 2010, at 2:34 PM, Rusty Lusk wrote:
> 64-bit integers too? by default?
>
> On Friday,Feb 5, 2010, at 2:28 PM, Barry Smith wrote:
>
>>
>> #include "mpi.h"
>> #include "stdlib.h"
>>
>> #undef __FUNCT__
>> #define __FUNCT__ "main"
>> int main(int argc,char **argv)
>> {
>> int ierr;
>> int size,rank;
>> int cnt = 433438806;
>> MPI_Status status;
>> long long int *cols;
>>
>> MPI_Init(&argc,&argv);
>> ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);
>> ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>>
>> cols = (long long int*) malloc(cnt*sizeof(long long));
>> if (rank == 0) {
>> ierr = MPI_Send(cols,cnt,MPI_LONG_LONG_INT,1,0,MPI_COMM_WORLD);
>>
>> } else {
>> ierr = MPI_Recv(cols,cnt,MPI_LONG_LONG_INT,
>> 0,0,MPI_COMM_WORLD,&status);
>> ierr = MPI_Get_count(&status,MPI_LONG_LONG_INT,&cnt);
>> printf("count %d\n",cnt);
>> }
>> ierr = MPI_Finalize();
>> return 0;
>> }
>>
>> crush is a 64 bit system with 64 bit pointers.
>>
>> crush:/usr> which mpicc
>> /soft/apps/packages/mpich2-1.2.1-gcc/bin/mpicc
>> crush:/usr> which mpiexec
>> /soft/apps/packages/mpich2-1.2.1-gcc/bin/mpiexec
>>
>> crush:~> mpicc mpitest.c
>> mpitest.c: In function ‘main’:
>> mpitest.c:25: warning: incompatible implicit declaration of built-
>> in function ‘printf’
>> crush:~> mpiexec -n 2 a.out
>> count -103432106
>>
>> I've had this problem reported to me by two completely different
>> PETSc users so it is a real problem, not just academic. My guess is
>> you don't use a long long int in the intermediate computations
>> needed to get the final value for count.
>>
>> To cheer you up, when I run with openMPI it runs forever sucking
>> down 100% CPU trying to send the messages :-)
>>
>> Barry
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
More information about the mpich-discuss
mailing list