[mpich2-dev] Problem with MPI_Type_commit() and assert in segment_ops.c
Joe Ratterman
jratt0 at gmail.com
Tue Jun 9 16:11:19 CDT 2009
I know it's not that important--and clearly not relevant--but BG/P will
generate a compiler warning if I use an MPI_Aint cast there. We want to
avoid any ambiguity that such a cast would involve (i.e. is it sign
extended?), so I use a cast that works correctly for this sort of
micro-tests. It is also correct on PPC32. This small example shows the
warnings:
$ cat -n size.c
1 #include <mpi.h>
2
3 extern void bar(MPI_Aint a, MPI_Aint b, MPI_Aint c);
4
5 void foo(void* p)
6 {
* 7 MPI_Aint aint = (MPI_Aint)p;*
8
* 9 MPI_Aint one = (long long int)p;*
10 MPI_Aint two = (int)p;
11
12 bar(aint, one, two);
13 }
$ /bgsys/drivers/ppcfloor/comm/bin/mpicc -Wall -g -c size.csize.c: In
function 'foo':
size.c:7: warning: cast from pointer to integer of different size
size.c:9: warning: cast from pointer to integer of different size
Thanks,
Joe Ratterman
jratt at us.ibm.com
On Tue, Jun 9, 2009 at 3:50 PM, Rob Ross <rross at mcs.anl.gov> wrote:
> Hi,
>
> Those type casts to (size_t) should be to (MPI_Aint).
>
> That assertion is checking that a parameter being passed to
> Segment_mpi_flatten is > 0. The parameter is the length of the list of
> regions being passed in by reference to be filled in (the destination of the
> list of regions). So for some reason we're getting a zero (or possibly
> negative) value passed in as the length of the arrays.
>
> There's only one place in the struct creation where Segment_mpi_flatten()
> is called; it's line 666 (evil!) of dataloop_create_struct.c. This is in
> DLOOP_Dataloop_create_flattened_struct(), which is a function used to make a
> struct into an indexed type.
>
> The "pairtypes", such as MPI_SHORT_INT, are special cases in MPI in that
> some of them have more than one "element type" (e.g. MPI_INT, MPI_SHORT_INT)
> in them. My guess is that there's an assumption in the
> DLOOP_Dataloop_create_flattened_struct() code path that is having trouble
> with the pairtype.
>
> I'm surprised that we might have introduced something between 1.0.7 and
> 1.1; I can't recall anything in particular that has changed in this code
> path. Someone should check the repo logs and see if something snuck in?
>
> Rob
>
>
> On Jun 9, 2009, at 3:13 PM, Joe Ratterman wrote:
>
> The specifics of this test come from an MPI excerciser that gathered
>> (using MPIR_Gather) a variety of types, including MPI_SHORT_INT. The way
>> that gather is implemented, it created and then sent a struct datatype of
>> the tmp-data from the software tree and the local-data. I pulled out the
>> important bits, and got this test-case. It asserts on PPC32 Linux 1.1 and
>> BGP 1.1rc0, but runs fine on 1.0.7. The addresses/displacements are fake,
>> but were originally based on the actual values used inside MPIR_Gather. It
>> does the type-create on the first two types just to show that it doesn't
>> always fail.
>>
>>
>> Error message:
>>
>> Creating addr=[0x1,0x2] types=[8c000003,4c00010d] struct_displs=[1,2]
>> blocks=[256,256] MPI_BOTTOM=(nil)
>> foo:25
>> Assertion failed in file segment_ops.c at line 994: *lengthp > 0
>> internal ABORT - process 0
>>
>>
>> Code
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <mpi.h>
>>
>> void foo(void *sendbuf,
>> MPI_Datatype sendtype,
>> void *recvbuf,
>> MPI_Datatype recvtype)
>> {
>> int blocks[2];
>> MPI_Aint struct_displs[2];
>> MPI_Datatype types[2], tmp_type;
>>
>> blocks[0] = 256;
>> struct_displs[0] = (size_t)sendbuf;
>> types[0] = sendtype;
>> blocks[1] = 256;
>> struct_displs[1] = (size_t)recvbuf;
>> types[1] = MPI_BYTE;
>>
>> printf("Creating addr=[%p,%p] types=[%x,%x] struct_displs=[%x,%x]
>> blocks=[%d,%d] MPI_BOTTOM=%p\n",
>> sendbuf, recvbuf, types[0], types[1], struct_displs[0],
>> struct_displs[1], blocks[0], blocks[1], MPI_BOTTOM);
>> MPI_Type_create_struct(2, blocks, struct_displs, types, &tmp_type);
>> printf("%s:%d\n", __func__, __LINE__);
>> MPI_Type_commit(&tmp_type);
>> printf("%s:%d\n", __func__, __LINE__);
>> MPI_Type_free (&tmp_type);
>> puts("Done");
>> }
>>
>>
>> int main()
>> {
>> MPI_Init(NULL, NULL);
>>
>> foo((void*)0x1,
>> MPI_FLOAT_INT,
>> (void*)0x2,
>> MPI_BYTE);
>> sleep(1);
>> foo((void*)0x1,
>> MPI_DOUBLE_INT,
>> (void*)0x2,
>> MPI_BYTE);
>> sleep(1);
>> foo((void*)0x1,
>> MPI_SHORT_INT,
>> (void*)0x2,
>> MPI_BYTE);
>>
>> MPI_Finalize();
>> return 0;
>> }
>>
>>
>>
>> I don't know anything about how this might be fixed, but we are looking
>> into it as well.
>>
>> Thanks,
>> Joe Ratterman
>> jratt at us.ibm.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20090609/8cef1690/attachment.htm>
More information about the mpich2-dev
mailing list