[mpich2-dev] Problem with MPI_Type_commit() and assert in segment_ops.c

Joe Ratterman jratt0 at gmail.com
Wed Jun 17 10:27:11 CDT 2009


Rob,

Have you (or your team) had any luck tracking this one down?  We haven't
been able to trace the cause ourselves.

Thanks,
Joe Ratterman
jratt at us.ibm.com


On Tue, Jun 9, 2009 at 3:50 PM, Rob Ross <rross at mcs.anl.gov> wrote:

> Hi,
>
> Those type casts to (size_t) should be to (MPI_Aint).
>
> That assertion is checking that a parameter being passed to
> Segment_mpi_flatten is > 0. The parameter is the length of the list of
> regions being passed in by reference to be filled in (the destination of the
> list of regions). So for some reason we're getting a zero (or possibly
> negative) value passed in as the length of the arrays.
>
> There's only one place in the struct creation where Segment_mpi_flatten()
> is called; it's line 666 (evil!) of dataloop_create_struct.c. This is in
> DLOOP_Dataloop_create_flattened_struct(), which is a function used to make a
> struct into an indexed type.
>
> The "pairtypes", such as MPI_SHORT_INT, are special cases in MPI in that
> some of them have more than one "element type" (e.g. MPI_INT, MPI_SHORT_INT)
> in them. My guess is that there's an assumption in the
> DLOOP_Dataloop_create_flattened_struct() code path that is having trouble
> with the pairtype.
>
> I'm surprised that we might have introduced something between 1.0.7 and
> 1.1; I can't recall anything in particular that has changed in this code
> path. Someone should check the repo logs and see if something snuck in?
>
> Rob
>
>
> On Jun 9, 2009, at 3:13 PM, Joe Ratterman wrote:
>
>  The specifics of this test come from an MPI excerciser that gathered
>> (using MPIR_Gather) a variety of types, including MPI_SHORT_INT.  The way
>> that gather is implemented, it created and then sent a struct datatype of
>> the tmp-data from the software tree and the local-data.  I pulled out the
>> important bits, and got this test-case.  It asserts on PPC32 Linux 1.1 and
>> BGP 1.1rc0, but runs fine on 1.0.7.  The addresses/displacements are fake,
>> but were originally based on the actual values used inside MPIR_Gather.  It
>> does the type-create on the first two types just to show that it doesn't
>> always fail.
>>
>>
>> Error message:
>>
>> Creating  addr=[0x1,0x2]  types=[8c000003,4c00010d]  struct_displs=[1,2]
>>  blocks=[256,256]  MPI_BOTTOM=(nil)
>> foo:25
>> Assertion failed in file segment_ops.c at line 994: *lengthp > 0
>> internal ABORT - process 0
>>
>>
>> Code
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <mpi.h>
>>
>> void foo(void *sendbuf,
>>         MPI_Datatype sendtype,
>>         void *recvbuf,
>>         MPI_Datatype recvtype)
>> {
>>  int blocks[2];
>>  MPI_Aint struct_displs[2];
>>  MPI_Datatype types[2], tmp_type;
>>
>>  blocks[0] = 256;
>>  struct_displs[0] = (size_t)sendbuf;
>>  types[0] = sendtype;
>>  blocks[1] = 256;
>>  struct_displs[1] = (size_t)recvbuf;
>>  types[1] = MPI_BYTE;
>>
>>  printf("Creating  addr=[%p,%p]  types=[%x,%x]  struct_displs=[%x,%x]
>>  blocks=[%d,%d]  MPI_BOTTOM=%p\n",
>>         sendbuf, recvbuf, types[0], types[1], struct_displs[0],
>> struct_displs[1], blocks[0], blocks[1], MPI_BOTTOM);
>>  MPI_Type_create_struct(2, blocks, struct_displs, types, &tmp_type);
>>  printf("%s:%d\n", __func__, __LINE__);
>>  MPI_Type_commit(&tmp_type);
>>  printf("%s:%d\n", __func__, __LINE__);
>>  MPI_Type_free  (&tmp_type);
>>  puts("Done");
>> }
>>
>>
>> int main()
>> {
>>  MPI_Init(NULL, NULL);
>>
>>  foo((void*)0x1,
>>      MPI_FLOAT_INT,
>>      (void*)0x2,
>>      MPI_BYTE);
>>  sleep(1);
>>  foo((void*)0x1,
>>      MPI_DOUBLE_INT,
>>      (void*)0x2,
>>      MPI_BYTE);
>>  sleep(1);
>>  foo((void*)0x1,
>>      MPI_SHORT_INT,
>>      (void*)0x2,
>>      MPI_BYTE);
>>
>>  MPI_Finalize();
>>  return 0;
>> }
>>
>>
>>
>> I don't know anything about how this might be fixed, but we are looking
>> into it as well.
>>
>> Thanks,
>> Joe Ratterman
>> jratt at us.ibm.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20090617/6846cd92/attachment.htm>


More information about the mpich2-dev mailing list