[mpich2-dev] Hvector with Zero Blocks Asserts
Jeff Parker
jjparker at us.ibm.com
Tue Mar 3 10:18:13 CST 2009
IBM Blue Gene/P has received a customer-reported problem that appears to be
in the stock MPICH2 code. The application is committing a datatype
consisting of an hvector having 0 blocks, which results in an assertion
that is wanting this value to be positive. The spec says the following,
specifically that count is a non-negative integer, so a value of zero
should be allowed:
Synopsis
#include "mpi.h"
int MPI_Type_hvector(
int count,
int blocklen,
MPI_Aint stride,
MPI_Datatype old_type,
MPI_Datatype *newtype )
Input Parameters
count number of blocks (nonnegative integer)
blocklength number of elements in each block
(nonnegative integer)
stride number of bytes between start of each
block (integer)
old_type old datatype (handle)
A reproducer is included below. It fails on Blue Gene/P (MPICH2 1.0.7) and
on Linux (MPICH2 1.0.7rc1), but works on Blue Gene/L (MPICH2 1.0.4p1).
This assertion did not exist in MPICH2 1.0.5p4, but appears in MPICH2 1.0.6
and later versions.
The assertion is in src/mpid/common/datatype/dataloop/segment_ops.c in
function DLOOP_Segment_contig_count_block. If the assertion is changed
from
DLOOP_Assert(*blocks_p > 0);
to
DLOOP_Assert(*blocks_p >= 0);
it works.
There are other places with this assertion, and other similar assertions
that may need fixing too:
grep -r "*blocks_p >" *
src/mpi/romio/common/dataloop/segment_ops.c: DLOOP_Assert(*blocks_p >
0);
src/mpi/romio/common/dataloop/segment_ops.c: DLOOP_Assert(count > 0 &&
blksz > 0 && *blocks_p > 0);
src/mpi/romio/common/dataloop/segment_ops.c: DLOOP_Assert(count > 0 &&
blksz > 0 && *blocks_p > 0);
src/mpi/romio/common/dataloop/segment_ops.c: DLOOP_Assert(count > 0 &&
*blocks_p > 0);
src/mpid/common/datatype/dataloop/segment_ops.c: DLOOP_Assert(*blocks_p
>= 0);
src/mpid/common/datatype/dataloop/segment_ops.c: DLOOP_Assert(count > 0
&& blksz > 0 && *blocks_p > 0);
src/mpid/common/datatype/dataloop/segment_ops.c: DLOOP_Assert(count > 0
&& blksz > 0 && *blocks_p > 0);
src/mpid/common/datatype/dataloop/segment_ops.c: DLOOP_Assert(count > 0
&& *blocks_p > 0);
Reproducer:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
MPI_Datatype mystruct, vecs[3];
MPI_Aint stride = 5, displs[3];
int i=0, blockcount[3];
MPI_Init(&argc, &argv);
for(i=0;i<3;i++)
{
/* important point appears to be the i==0 vectors here */
MPI_Type_hvector(i, 1, stride, MPI_INT, &vecs[i]);
MPI_Type_commit(&vecs[i]);
blockcount[i]=1;
}
displs[0]=0; displs[1]=-100; displs[2]=-200; /* irrelevant */
MPI_Type_struct(3, blockcount, displs, vecs, &mystruct);
fprintf(stderr,"Before commiting structure\n");
MPI_Type_commit(&mystruct);
fprintf(stderr,"After commiting structure\n");
MPI_Finalize();
return 0;
}
Output (in and after MPICH2 1.0.6):
Before commiting structure
Before commiting structure
Assertion failed in
file /bglhome/usr6/bgbuild/V1R3M0_460_2008-081112P/ppc/bgp/comm/lib/dev/mpich2/src/mpid/common/datatype/dataloop/segment_ops.c
at line 375: *blocks_p > 0
Assertion failed in
file /bglhome/usr6/bgbuild/V1R3M0_460_2008-081112P/ppc/bgp/comm/lib/dev/mpich2/src/mpid/common/datatype/dataloop/segment_ops.c
at line 375: *blocks_p > 0
Abort(1) on node 1: Internal error
Abort(1) on node 0: Internal error
Jeff Parker
Blue Gene Messaging
61L/030-2 A407 507-253-4208 TieLine: 553-4208
Notes email: Jeff Parker/Rochester/IBM
INTERNET: jjparker at us.ibm.com AFS: jeff at rchland
More information about the mpich2-dev
mailing list