[mpich2-dev] Problem with MPI_Type_commit() and assert in segment_ops.c

Joe Ratterman jratt0 at gmail.com
Tue Jun 9 16:11:19 CDT 2009


I know it's not that important--and clearly not relevant--but BG/P will
generate a compiler warning if I use an MPI_Aint cast there.  We want to
avoid any ambiguity that such a cast would involve (i.e. is it sign
extended?), so I use a cast that works correctly for this sort of
micro-tests.  It is also correct on PPC32.  This small example shows the
warnings:
$ cat -n size.c
     1  #include <mpi.h>
     2
     3  extern void bar(MPI_Aint a, MPI_Aint b, MPI_Aint c);
     4
     5  void foo(void* p)
     6  {
*     7    MPI_Aint aint = (MPI_Aint)p;*
     8
*     9    MPI_Aint one = (long long int)p;*
    10    MPI_Aint two = (int)p;
    11
    12    bar(aint, one, two);
    13  }

$ /bgsys/drivers/ppcfloor/comm/bin/mpicc -Wall -g -c size.csize.c: In
function 'foo':
size.c:7: warning: cast from pointer to integer of different size
size.c:9: warning: cast from pointer to integer of different size


Thanks,
Joe Ratterman
jratt at us.ibm.com

On Tue, Jun 9, 2009 at 3:50 PM, Rob Ross <rross at mcs.anl.gov> wrote:

> Hi,
>
> Those type casts to (size_t) should be to (MPI_Aint).
>
> That assertion is checking that a parameter being passed to
> Segment_mpi_flatten is > 0. The parameter is the length of the list of
> regions being passed in by reference to be filled in (the destination of the
> list of regions). So for some reason we're getting a zero (or possibly
> negative) value passed in as the length of the arrays.
>
> There's only one place in the struct creation where Segment_mpi_flatten()
> is called; it's line 666 (evil!) of dataloop_create_struct.c. This is in
> DLOOP_Dataloop_create_flattened_struct(), which is a function used to make a
> struct into an indexed type.
>
> The "pairtypes", such as MPI_SHORT_INT, are special cases in MPI in that
> some of them have more than one "element type" (e.g. MPI_INT, MPI_SHORT_INT)
> in them. My guess is that there's an assumption in the
> DLOOP_Dataloop_create_flattened_struct() code path that is having trouble
> with the pairtype.
>
> I'm surprised that we might have introduced something between 1.0.7 and
> 1.1; I can't recall anything in particular that has changed in this code
> path. Someone should check the repo logs and see if something snuck in?
>
> Rob
>
>
> On Jun 9, 2009, at 3:13 PM, Joe Ratterman wrote:
>
>  The specifics of this test come from an MPI excerciser that gathered
>> (using MPIR_Gather) a variety of types, including MPI_SHORT_INT.  The way
>> that gather is implemented, it created and then sent a struct datatype of
>> the tmp-data from the software tree and the local-data.  I pulled out the
>> important bits, and got this test-case.  It asserts on PPC32 Linux 1.1 and
>> BGP 1.1rc0, but runs fine on 1.0.7.  The addresses/displacements are fake,
>> but were originally based on the actual values used inside MPIR_Gather.  It
>> does the type-create on the first two types just to show that it doesn't
>> always fail.
>>
>>
>> Error message:
>>
>> Creating  addr=[0x1,0x2]  types=[8c000003,4c00010d]  struct_displs=[1,2]
>>  blocks=[256,256]  MPI_BOTTOM=(nil)
>> foo:25
>> Assertion failed in file segment_ops.c at line 994: *lengthp > 0
>> internal ABORT - process 0
>>
>>
>> Code
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <mpi.h>
>>
>> void foo(void *sendbuf,
>>         MPI_Datatype sendtype,
>>         void *recvbuf,
>>         MPI_Datatype recvtype)
>> {
>>  int blocks[2];
>>  MPI_Aint struct_displs[2];
>>  MPI_Datatype types[2], tmp_type;
>>
>>  blocks[0] = 256;
>>  struct_displs[0] = (size_t)sendbuf;
>>  types[0] = sendtype;
>>  blocks[1] = 256;
>>  struct_displs[1] = (size_t)recvbuf;
>>  types[1] = MPI_BYTE;
>>
>>  printf("Creating  addr=[%p,%p]  types=[%x,%x]  struct_displs=[%x,%x]
>>  blocks=[%d,%d]  MPI_BOTTOM=%p\n",
>>         sendbuf, recvbuf, types[0], types[1], struct_displs[0],
>> struct_displs[1], blocks[0], blocks[1], MPI_BOTTOM);
>>  MPI_Type_create_struct(2, blocks, struct_displs, types, &tmp_type);
>>  printf("%s:%d\n", __func__, __LINE__);
>>  MPI_Type_commit(&tmp_type);
>>  printf("%s:%d\n", __func__, __LINE__);
>>  MPI_Type_free  (&tmp_type);
>>  puts("Done");
>> }
>>
>>
>> int main()
>> {
>>  MPI_Init(NULL, NULL);
>>
>>  foo((void*)0x1,
>>      MPI_FLOAT_INT,
>>      (void*)0x2,
>>      MPI_BYTE);
>>  sleep(1);
>>  foo((void*)0x1,
>>      MPI_DOUBLE_INT,
>>      (void*)0x2,
>>      MPI_BYTE);
>>  sleep(1);
>>  foo((void*)0x1,
>>      MPI_SHORT_INT,
>>      (void*)0x2,
>>      MPI_BYTE);
>>
>>  MPI_Finalize();
>>  return 0;
>> }
>>
>>
>>
>> I don't know anything about how this might be fixed, but we are looking
>> into it as well.
>>
>> Thanks,
>> Joe Ratterman
>> jratt at us.ibm.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20090609/8cef1690/attachment.htm>


More information about the mpich2-dev mailing list