[mpich-discuss] Strange invalid pointer error
Thomas Ruedas
ruedas at dtm.ciw.edu
Tue Oct 27 22:09:25 CDT 2009
Rajeev Thakur wrote:
> The buftot that is passed to MPI_Gather on the root (rank 0) needs to be
> allocated of size RLEN*n*nprocs where nprocs is the size of COMM_WORLD.
> Is it that size?
No, but the way I understand the documentation on
http://www.mpi-forum.org/docs/mpi-11-html/node69.html#Node69
it shouldn't be, because it says there:
[ IN recvcount] number of elements for any single receive (integer,
significant only at root)
I interpret this as meaning that I should give the size of every slice
passed to buftot from every subprocess. Is that wrong?
For clarity: RLEN is the length in bytes of a single-precision real,
n?tot are the dimensions of the total grid, n?pn the dimension of the
part on every single node. The point of the routine of f_bindump is to
collect the results of all subgrids from all nodes into a single big
array on the root and write them into a file.
If I try out what you suggest, the subroutine ggather looks as follows:
subroutine ggather(buf,buftot,n)
use mpi
use precision,only: RLEN
implicit none
integer, intent(in) :: n
integer :: nprocs,ierr
real, intent(in) :: buf(n)
real, intent(out) :: buftot(*)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ierr)
call
MPI_GATHER(buf,RLEN*n,MPI_BYTE,buftot,RLEN*n*nprocs,MPI_BYTE,0,MPI_COMM_WORLD,ierr)
end subroutine ggather
This results in the following error:
Backtrace of the callstack at rank 0:
Backtrace of the callstack at rank 1:
Backtrace of the callstack at rank 2:
Backtrace of the callstack at rank 3:
At [0]: stagyympi(CollChk_err_han+0xd4)[0x848a1dc]
At [1]: stagyympi(CollChk_dtype_scatter+0x11c)[0x848b7c7]
At [2]: stagyympi(MPI_Gather+0xb0)[0x848a36c]
At [3]: stagyympi(mpi_gather_+0x61)[0x8489354]
At [4]: stagyympi(ggather_+0x6a)[0x845d79c]
At [5]: stagyympi(f_bindump_+0x2db)[0x81b7c3b]
At [6]: stagyympi(dump_frame_+0x3407)[0x81bbc21]
At [7]: stagyympi(MAIN__+0x13791)[0x807fee1]
At [8]: stagyympi(main+0x42)[0x806c73a]
At [9]: /lib/tls/libc.so.6(__libc_start_main+0xd3)[0x503de3]
At [10]: stagyympi[0x806c671]
At [0]: stagyympi(CollChk_err_han+0xd4)[0x848a1dc]
Fatal error in MPI_Comm_call_errhandler:
Collective Checking: GATHER (Rank 2) --> Inconsistent datatype
signatures detected between rank 2 and rank 0.
At [0]: stagyympi(CollChk_err_han+0xd4)[0x848a1dc]
At [1]: stagyympi(CollChk_dtype_scatter+0x11c)[0x848b7c7]
At [2]: stagyympi(MPI_Gather+0xb0)[0x848a36c]
At [1]: stagyympi(CollChk_dtype_scatter+0x11c)[0x848b7c7]
At [0]: stagyympi(CollChk_err_han+0xd4)[0x848a1dc]
At [3]: stagyympi(mpi_gather_+0x61)[0x8489354]
At [4]: stagyympi(ggather_+0x6a)[0x845d79c]
At [5]: stagyympi(f_bindump_+0x2db)[0x81b7c3b]
At [2]: stagyympi(MPI_Gather+0xb0)[0x848a36c]
At [3]: stagyympi(mpi_gather_+0x61)[0x8489354]
At [1]: stagyympi(CollChk_dtype_scatter+0x11c)[0x848b7c7]
At [2]: stagyympi(MPI_Gather+0xb0)[0x848a36c]
At [6]: stagyympi(dump_frame_+0x3407)[0x81bbc21]
At [7]: stagyympi(MAIN__+0x13791)[0x807fee1]
At [8]: stagyympi(main+0x42)[0x806c73a]
At [4]: stagyympi(ggather_+0x6a)[0x845d79c]
At [5]: stagyympi(f_bindump_+0x2db)[0x81b7c3b]
At [3]: stagyympi(mpi_gather_+0x61)[0x8489354]
At [9]: /lib/tls/libc.so.6(__libc_start_main+0xd3)[0x38ede3]
At [10]: stagyympi[0x806c671]
Fatal error in MPI_Comm_call_errhandler:
Collective Checking: GATHER (Rank 3) --> Inconsistent datatype
signatures detected between rank 3 and rank 0.
At [6]: stagyympi(dump_frame_+0x3407)[0x81bbc21]
At [4]: stagyympi(ggather_+0x6a)[0x845d79c]
At [5]: stagyympi(f_bindump_+0x2db)[0x81b7c3b]
At [7]: stagyympi(MAIN__+0x13791)[0x807fee1]
At [8]: stagyympi(main+0x42)[0x806c73a]
At [9]: /lib/tls/libc.so.6(__libc_start_main+0xd3)[0x7fbde3]
At [6]: stagyympi(dump_frame_+0x3407)[0x81bbc21]
At [7]: stagyympi(MAIN__+0x13791)[0x807fee1]
At [10]: stagyympi[0x806c671]
Fatal error in MPI_Comm_call_errhandler:
Collective Checking: GATHER (Rank 1) --> Inconsistent datatype
signatures detected between rank 1 and rank 0.
At [8]: stagyympi(main+0x42)[0x806c73a]
At [9]: /lib/tls/libc.so.6(__libc_start_main+0xd3)[0xb89de3]
At [10]: stagyympi[0x806c671]
Fatal error in MPI_Comm_call_errhandler:
Collective Checking: GATHER (Rank 0) --> Inconsistent datatype
signatures detected between rank 0 and rank 0.
rank 2 in job 6 xenia_46167 caused collective abort of all ranks
exit status of rank 2: killed by signal 9
rank 3 in job 6 xenia_46167 caused collective abort of all ranks
exit status of rank 3: killed by signal 9
A similar error occurs if I calculate the size of buftot in a different
way, without using MPI_COMM_SIZE. Evidently, I misunderstand something,
but here I don't see why the datatype is inconsistent.
Thomas
--
-----------------------------------
Thomas Ruedas
Department of Terrestrial Magnetism
Carnegie Institution of Washington
http://www.dtm.ciw.edu/users/ruedas/
More information about the mpich-discuss
mailing list