[mpich-discuss] DataType Problem

Michele Rosso michele.rosso84 at gmail.com
Mon Jan 31 14:03:30 CST 2011


Hi Jim,

first of all thanks a lot for your help.
During the week end I completely rewrite the subroutine. Now it works
but still I have small problems.

What I am trying to accomplish is a matrix transposition.
I need to perform a 3D FFT on a N^3 matrix, where N is always a power of
2.
I am using a 2D domain decomposition. The problem is that the first fft
(from real to complex) results in N/2+1 points along the coordinate 1.
When I transpose from direction 1 to direction 2 (along columns in my
setup) not all the processors have the same amount of data.

I attached the current subroutine I am using (do not consider sgt23_pmu
since it will serve for the transposition 2->3 and it is not complete
yet).

If I perform the direct transposition (1-->2) with the new subroutine,
it works perfectly. The only problem is that it crashes if I try to free
the datatype at the end.

If I perform the inverse transposition (2-->1), it works as expected
only if I use a number of processors which is a perfect square (say
16 ).
If I try with 8 processors for example ( so the domain is decomposed
into beams with a rectangular base ) the inverse transposition does not
work anymore and i receive the following message:

Fatal error in MPI_Alltoallw: Pending request (no error), error stack:
MPI_Alltoallw(485): MPI_Alltoallw(sbuf=0x1eb9390, scnts=0x1e9c780,
sdispls=0x1e9ee80, stypes=0x1e9eea0, rbuf=0x1e9dc70, rcnts=0x1e9eec0,
rdispls=0x1e9f100, rtypes=0x1e9ca90, comm=0x84000004) failed
(unknown)(): Pending request (no error)
Fatal error in MPI_Alltoallw: Pending request (no error), error stack:
MPI_Alltoallw(485): MPI_Alltoallw(sbuf=0x1cf5370, scnts=0x1cf1780,
sdispls=0x1cf3e80, stypes=0x1cf3ea0, rbuf=0x1cf2c70, rcnts=0x1cf3ec0,
rdispls=0x1cf4100, rtypes=0x1cf1a70, comm=0x84000004) failed
(unknown)(): Pending request (no error)
rank 5 in job 26  enterprise_57863   caused collective abort of all
ranks
  exit status of rank 5: killed by signal 9 
Fatal error in MPI_Alltoallw: Pending request (no error), error stack:
MPI_Alltoallw(485): MPI_Alltoallw(sbuf=0x218e5a0, scnts=0x2170780,
sdispls=0x2170a40, stypes=0x2170a60, rbuf=0x218c180, rcnts=0x2170a80,
rdispls=0x2170d10, rtypes=0x2170d30, comm=0x84000004) failed
(unknown)(): Pending request (no error)

So my problems are essentially two:

1) Unable to free the datatypes

2) Unable to perform the backward transposition when N_proc is not
perfect square.

Again, thanks a lot for your help,

Michele

On Mon, 2011-01-31 at 13:44 -0600, James Dinan wrote:
> Hi Michele,
> 
> I've attached a small test case derived from what you sent.  This runs 
> fine for me with the integer change suggested below.
> 
> I'm still a little confused about the need for 
> mpi_type_create_resized().  You're setting the lower bound to 1 and the 
> extent to the size of a double complex.  These adjustments are in bytes, 
> so if I'm interpreting this correctly you are effectively shifting the 
> beginning of the data type 1 byte into the first value in the array and 
> then accessing a full double complex from that location.  This seems 
> like it's probably not what you would want to do.
> 
> Could you explain the subset of the data you're trying to cover with the 
> datatype?
> 
> Thanks,
>   ~Jim.
> 
> On 01/31/2011 11:13 AM, James Dinan wrote:
> > Hi Michele,
> >
> > Another quick comment:
> >
> > Don't forget to free your MPI datatypes when you're finished with them.
> > This shouldn't cause the error you're seeing, but it can be a resource
> > leak that builds up over time if you call this routine frequently.
> >
> > call mpi_type_free(temp, errorMPI)
> > call mpi_type_free(temp2, errorMPI)
> > call mpi_type_free(temp3, errorMPI)
> >
> > Best,
> > ~Jim.
> >
> > On 01/31/2011 11:07 AM, James Dinan wrote:
> >> Hi Michele,
> >>
> >> I'm looking this over and trying to put together a test case from the
> >> code you sent. One thing that looks questionable is the type for 'ext'.
> >> The call to mpi_type_size wants an integer, however the
> >> mpi_type_create_resized calls want an integer of kind=MPI_ADDRESS_KIND.
> >> Could you try adding something like this:
> >>
> >> integer :: dcsize
> >> integer (kind=MPI_ADDRESS_KIND) :: ext
> >>
> >> call mpi_type_size( mpi_double_complex , dcsize , errorMPI)
> >> ext = dcsize
> >>
> >> Thanks,
> >> ~Jim.
> >>
> >> On 01/30/2011 02:15 AM, Michele Rosso wrote:
> >>> Hi,
> >>>
> >>>
> >>> I am developing a subroutine to handle the communication inside a group
> >>> of processors.
> >>> The source code is attached.
> >>>
> >>> Such subroutine is contained in a module and accesses many of the data
> >>> it needs and the header "mpi.h" from another module (pmu_var).
> >>>
> >>> As an input I have a 3D array (work1) which is allocated in the main
> >>> program. As an output I have another 3D matrix (work2) which is
> >>> allocated in the main program too. Both of them are of type complex and
> >>> have intent INOUT (I wanna use the subroutine in a reversible way ).
> >>>
> >>> Since the data I wanna send are not contiguous, I defined several data
> >>> types. Then I tested all of them with a simple send-receive
> >>> communication in the group "mpi_comm_world".
> >>> The problem arises when I tested the data type "temp3": the esecution of
> >>> the program stops and I receive the error:
> >>>
> >>> rank 0 in job 8 enterprise_45569 caused collective abort of all ranks
> >>> exit status of rank 0: killed by signal 9
> >>>
> >>> Notice that work1 and work2 have different size but the same shape and
> >>> the data type should be coherent with them.
> >>>
> >>> Has anyone and idea of which the problem could be?
> >>>
> >>>
> >>> Thanks in advance,
> >>>
> >>> Michele
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> mpich-discuss mailing list
> >>> mpich-discuss at mcs.anl.gov
> >>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>
> >> _______________________________________________
> >> mpich-discuss mailing list
> >> mpich-discuss at mcs.anl.gov
> >> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gt_sub_pmu.f95
Type: text/x-fortran
Size: 10197 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110131/230bdce9/attachment.bin>


More information about the mpich-discuss mailing list