[mpich-discuss] MPI_REDUCE and complex numbers

Brian Dushaw dushaw at apl.washington.edu
Fri Aug 19 15:21:01 CDT 2011


Never mind on the errors...I neglected to copy the revised binary over to the
other machine.  So sorry; I get no errors with this code.

This should not be construed to suggest that the original error was caused by
a failure to copy updated binaries; the makefile for it updates the newly compiled 
binary on all nodes automatically.

On Fri, 19 Aug 2011, Brian Dushaw wrote:

> Below is a slightly revised version of Jeff's test case, it more
> closely mimics the situation of my code.  I find it behaves strangely
> on a 2 node, 8 processor system (my cluster is shut down at the moment; I 
> can't fuss with this any more...)
>
> This throws an error message (why?) and produces some strange results.
> 28800000/64000 = 450 the number of rows of the arrays.
>
> I am using the sunstudio compiler, btw.
>
>> time mpiexec testreduce
> Fatal error in PMPI_Reduce: Message truncated, error stack:
> MPIDI_CH3U_Receive_data_found(129): Message from rank 0 and tag 11 truncated; 
> 28800000 bytes received but buffer size is 64000
> i, x(i), y(i), z(i)
> 1 (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> 1 (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) 
> (4.0,4.0)
> (4.0,4.0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0)
> 2 (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> 2 (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) 
> (4.0,4.0)
> (4.0,4.0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0)
> 3 (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> 3 (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) 
> (4.0,4.0)
> (4.0,4.0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0)
> 4 (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> 4 (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) 
> (4.0,4.0)
> (4.0,4.0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0)
> 5 (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) 
> (0.0E+0,0.0E+0)
> 5 (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) 
> (4.0,4.0)
> (4.0,4.0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0) (0.0E+0,0.0E+0)
> (0.0E+0,0.0E+0) (0.0E+0,0.0E+0)
> 6 (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) 
> (1.0,1.0)
> (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0)
> 6 (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) 
> (8.0,8.0)
> (8.0,8.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0)
> 7 (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) 
> (1.0,1.0)
> (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0)
> 7 (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) 
> (8.0,8.0)
> (8.0,8.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0)
> 8 (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) 
> (1.0,1.0)
> (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0) (1.0,1.0)
> 8 (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) (8.0,8.0) 
> (8.0,8.0)
> (8.0,8.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0) (4.0,4.0)
>
> [and so on, until:]
>
> Fatal error in PMPI_Barrier: Other MPI error
>
> The reduced values should be zeros for the first 5 rows.
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> program main use mpi implicit none integer,parameter :: wp = kind(1.d0)
> integer rc integer rank, worldsize integer i,j,n,m,itemp 
> complex(kind=wp),dimension(:,:),allocatable :: x,y,z
>
> call mpi_init(rc) call mpi_comm_rank(MPI_COMM_WORLD,rank,rc) call 
> mpi_comm_size(MPI_COMM_WORLD,worldsize,rc)
>
> m =   450
> n =  8000 allocate(x(m,n),y(m,n),z(m,n))
> x=0.0_wp*x
> y=0.0_wp*y
> z=0.0_wp*z
>
> do i = 6, 25
>   do j = 1, n
>      x(i,j) = (1.0_wp,1.0_wp)
>   end do
> end do
>
> itemp=n*m
>
> call MPI_REDUCE(x,y,itemp,MPI_DOUBLE_COMPLEX,MPI_SUM,0,MPI_COMM_WORLD,rc) 
> !call mpi_allreduce(x,z,n,MPI_DOUBLE_COMPLEX,MPI_SUM,MPI_COMM_WORLD,rc)
>
> if (rank .eq. 0) then
>    print*,'i, x(i), y(i), z(i)'
>    do i = 1, 30
>       print *, i, (x(i,j), j=1,15)
>       print *, i, (y(i,j), j=1,15)
> !       print *, i, (z(i,j), j=1,15)
>    end do
>    print *,'the right answer is ',worldsize*(1.0,1.0) end if
>
> call MPI_BARRIER(MPI_COMM_WORLD,rc) call MPI_FINALIZE(rc) stop
> end program main %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>


More information about the mpich-discuss mailing list