[mpich-discuss] Strange invalid pointer error

Thomas Ruedas ruedas at dtm.ciw.edu
Tue Oct 27 22:09:25 CDT 2009


Rajeev Thakur wrote:
> The buftot that is passed to MPI_Gather on the root (rank 0) needs to be
> allocated of size RLEN*n*nprocs where nprocs is the size of COMM_WORLD.
> Is it that size?
No, but the way I understand the documentation on 
http://www.mpi-forum.org/docs/mpi-11-html/node69.html#Node69
it shouldn't be, because it says there:
[ IN recvcount] number of elements for any single receive (integer, 
significant only at root)
I interpret this as meaning that I should give the size of every slice 
passed to buftot from every subprocess. Is that wrong?
For clarity: RLEN is the length in bytes of a single-precision real, 
n?tot are the dimensions of the total grid, n?pn the dimension of the 
part on every single node. The point of the routine of f_bindump is to 
collect the results of all subgrids from all nodes into a single big 
array on the root and write them into a file.
If I try out what you suggest, the subroutine ggather looks as follows:

subroutine ggather(buf,buftot,n)
use mpi
use precision,only: RLEN
implicit none
integer, intent(in) :: n
integer :: nprocs,ierr
real, intent(in) :: buf(n)
real, intent(out) :: buftot(*)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ierr)
call 
MPI_GATHER(buf,RLEN*n,MPI_BYTE,buftot,RLEN*n*nprocs,MPI_BYTE,0,MPI_COMM_WORLD,ierr)
end subroutine ggather

This results in the following error:
Backtrace of the callstack at rank 0:
Backtrace of the callstack at rank 1:
Backtrace of the callstack at rank 2:
Backtrace of the callstack at rank 3:
	At [0]: stagyympi(CollChk_err_han+0xd4)[0x848a1dc]
	At [1]: stagyympi(CollChk_dtype_scatter+0x11c)[0x848b7c7]
	At [2]: stagyympi(MPI_Gather+0xb0)[0x848a36c]
	At [3]: stagyympi(mpi_gather_+0x61)[0x8489354]
	At [4]: stagyympi(ggather_+0x6a)[0x845d79c]
	At [5]: stagyympi(f_bindump_+0x2db)[0x81b7c3b]
	At [6]: stagyympi(dump_frame_+0x3407)[0x81bbc21]
	At [7]: stagyympi(MAIN__+0x13791)[0x807fee1]
	At [8]: stagyympi(main+0x42)[0x806c73a]
	At [9]: /lib/tls/libc.so.6(__libc_start_main+0xd3)[0x503de3]
	At [10]: stagyympi[0x806c671]
	At [0]: stagyympi(CollChk_err_han+0xd4)[0x848a1dc]
Fatal error in MPI_Comm_call_errhandler:

Collective Checking: GATHER (Rank 2) --> Inconsistent datatype 
signatures detected between rank 2 and rank 0.



	At [0]: stagyympi(CollChk_err_han+0xd4)[0x848a1dc]
	At [1]: stagyympi(CollChk_dtype_scatter+0x11c)[0x848b7c7]
	At [2]: stagyympi(MPI_Gather+0xb0)[0x848a36c]
	At [1]: stagyympi(CollChk_dtype_scatter+0x11c)[0x848b7c7]
	At [0]: stagyympi(CollChk_err_han+0xd4)[0x848a1dc]
	At [3]: stagyympi(mpi_gather_+0x61)[0x8489354]
	At [4]: stagyympi(ggather_+0x6a)[0x845d79c]
	At [5]: stagyympi(f_bindump_+0x2db)[0x81b7c3b]
	At [2]: stagyympi(MPI_Gather+0xb0)[0x848a36c]
	At [3]: stagyympi(mpi_gather_+0x61)[0x8489354]
	At [1]: stagyympi(CollChk_dtype_scatter+0x11c)[0x848b7c7]
	At [2]: stagyympi(MPI_Gather+0xb0)[0x848a36c]
	At [6]: stagyympi(dump_frame_+0x3407)[0x81bbc21]
	At [7]: stagyympi(MAIN__+0x13791)[0x807fee1]
	At [8]: stagyympi(main+0x42)[0x806c73a]
	At [4]: stagyympi(ggather_+0x6a)[0x845d79c]
	At [5]: stagyympi(f_bindump_+0x2db)[0x81b7c3b]
	At [3]: stagyympi(mpi_gather_+0x61)[0x8489354]
	At [9]: /lib/tls/libc.so.6(__libc_start_main+0xd3)[0x38ede3]
	At [10]: stagyympi[0x806c671]
Fatal error in MPI_Comm_call_errhandler:

Collective Checking: GATHER (Rank 3) --> Inconsistent datatype 
signatures detected between rank 3 and rank 0.



	At [6]: stagyympi(dump_frame_+0x3407)[0x81bbc21]
	At [4]: stagyympi(ggather_+0x6a)[0x845d79c]
	At [5]: stagyympi(f_bindump_+0x2db)[0x81b7c3b]
	At [7]: stagyympi(MAIN__+0x13791)[0x807fee1]
	At [8]: stagyympi(main+0x42)[0x806c73a]
	At [9]: /lib/tls/libc.so.6(__libc_start_main+0xd3)[0x7fbde3]
	At [6]: stagyympi(dump_frame_+0x3407)[0x81bbc21]
	At [7]: stagyympi(MAIN__+0x13791)[0x807fee1]
	At [10]: stagyympi[0x806c671]
Fatal error in MPI_Comm_call_errhandler:

Collective Checking: GATHER (Rank 1) --> Inconsistent datatype 
signatures detected between rank 1 and rank 0.



	At [8]: stagyympi(main+0x42)[0x806c73a]
	At [9]: /lib/tls/libc.so.6(__libc_start_main+0xd3)[0xb89de3]
	At [10]: stagyympi[0x806c671]
Fatal error in MPI_Comm_call_errhandler:

Collective Checking: GATHER (Rank 0) --> Inconsistent datatype 
signatures detected between rank 0 and rank 0.



rank 2 in job 6  xenia_46167   caused collective abort of all ranks
   exit status of rank 2: killed by signal 9
rank 3 in job 6  xenia_46167   caused collective abort of all ranks
   exit status of rank 3: killed by signal 9


A similar error occurs if I calculate the size of buftot in a different 
way, without using MPI_COMM_SIZE. Evidently, I misunderstand something, 
but here I don't see why the datatype is inconsistent.
Thomas
-- 
-----------------------------------
Thomas Ruedas
Department of Terrestrial Magnetism
Carnegie Institution of Washington
http://www.dtm.ciw.edu/users/ruedas/


More information about the mpich-discuss mailing list