[petsc-dev] how did this buggy mess get into master?

Lawrence Mitchell lawrence.mitchell at imperial.ac.uk
Wed Apr 8 06:29:32 CDT 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/04/15 23:07, Barry Smith wrote:
> 
> Matt,
> 
> How could you have moved this into master? The ex2.c example
> didn't compile for complex and was totally incorrectly formatted.
> The makefile had missing tabs so the rules runex2_basic and
> runex2_window didn't even run plus the $ was missing around MPIEXEC
> so even if they had run they couldn't run plus

Apologies, I shall go away and re-read the developers guide more
carefully.

> it also appears that runex2_window doesn't even work.

I think this is because in the example I pass the same buffer as both in
the input and output argument to PetscSFBcastBegin/End.  MPICH turns
this into a memcpy (on-node) and the MPI_Get call ends up doing a memcpy
onto overlapping memory in this case.  My reading of the standard is
that this is /probably/ disallowed, but I'm a little lost.

The attached code demonstrates the issue in a pure MPI setting.  MPICH
complains that the internal localcopy using memcpy overlaps source and
destination.

I wonder, given this, whether the window version of PetscSFBcast should
error (at least in debug mode) if the source and destination buffers are
the same.

I can fix this example to work in this case with the windowed SF type by
adding different output buffers.  It may well still not work with
openmpi though (depending on version), I'm not sure how to specify that
a test should only be run under certain configurations.

Cheers,

Lawrence

$ mpiexec -n 2 ./a.out
Fatal error in MPI_Get: Internal MPI error!, error stack:
MPI_Get(160)....................: MPI_Get(origin_addr=0x2371220,
origin_count=1, dtype=USER<indexed_block>, target_rank=0, target_disp=0,
target_count=1, dtype=USER<indexed_block>, win=0xa0000000) failed
MPIDI_Get(281)..................:
MPIDI_CH3I_Shm_get_op(927)......:
MPIDI_CH3I_RMA_Get_ops_list(541):
MPIR_Localcopy(123).............: memcpy arguments alias each other,
dst=0x2371220 src=0x2371220 len=16
[cli_0]: aborting job:
Fatal error in MPI_Get: Internal MPI error!, error stack:
MPI_Get(160)....................: MPI_Get(origin_addr=0x2371220,
origin_count=1, dtype=USER<indexed_block>, target_rank=0, target_disp=0,
target_count=1, dtype=USER<indexed_block>, win=0xa0000000) failed
MPIDI_Get(281)..................:
MPIDI_CH3I_Shm_get_op(927)......:
MPIDI_CH3I_RMA_Get_ops_list(541):
MPIR_Localcopy(123).............: memcpy arguments alias each other,
dst=0x2371220 src=0x2371220 len=16
Fatal error in MPI_Get: Internal MPI error!, error stack:
MPI_Get(160)....................: MPI_Get(origin_addr=0xfd5200,
origin_count=1, dtype=USER<indexed_block>, target_rank=1, target_disp=0,
target_count=1, dtype=USER<indexed_block>, win=0xa0000000) failed
MPIDI_Get(281)..................:
MPIDI_CH3I_Shm_get_op(927)......:
MPIDI_CH3I_RMA_Get_ops_list(541):
MPIR_Localcopy(123).............: memcpy arguments alias each other,
dst=0xfd5200 src=0xfd5200 len=16
[cli_1]: aborting job:
Fatal error in MPI_Get: Internal MPI error!, error stack:
MPI_Get(160)....................: MPI_Get(origin_addr=0xfd5200,
origin_count=1, dtype=USER<indexed_block>, target_rank=1, target_disp=0,
target_count=1, dtype=USER<indexed_block>, win=0xa0000000) failed
MPIDI_Get(281)..................:
MPIDI_CH3I_Shm_get_op(927)......:
MPIDI_CH3I_RMA_Get_ops_list(541):
MPIR_Localcopy(123).............: memcpy arguments alias each other,
dst=0xfd5200 src=0xfd5200 len=16



#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define CHKERRQ(n)

int main(int argc, char **argv)
{
    int ierr;
    MPI_Datatype unit, mine[2], remote[2];
    int rcount[2];
    int roffset[2];
    int rmine[2];
    int rremote[2];
    int ranks[2];
    int size;
    int rank;
    int i;
    MPI_Aint bytes;
    MPI_Aint lb;
    int nroots = -1;
    int nleaves = 2;
    int bs = 2;
    int nranks = 2;
    MPI_Comm comm;
    MPI_Win win;

    double *rootdata;
    double *leafdata;
    MPI_Init(&argc, &argv);
    comm = MPI_COMM_WORLD;

    ierr = MPI_Comm_rank(comm, &rank); CHKERRQ(ierr);
    ierr = MPI_Comm_size(comm, &size); CHKERRQ(ierr);

    ierr = MPI_Type_contiguous(bs, MPI_DOUBLE, &unit); CHKERRQ(ierr);
    ierr = MPI_Type_commit(&unit); CHKERRQ(ierr);


    ranks[0] = 0;
    ranks[1] = 1;
    if ( rank == 0 ) {
        nroots = 2;
        roffset[0] = 0;
        roffset[1] = 1;
        rcount[0] = 1;
        rcount[1] = 1;
        rmine[0] = 0;
        rmine[1] = 1;
        rremote[0] = 0;
        rremote[1] = 0;
    } else if ( rank == 1 ) {
        nroots = 2;
        roffset[0] = 0;
        roffset[1] = 1;
        rcount[0] = 1;
        rcount[1] = 1;
        rmine[0] = 1;
        rmine[1] = 0;
        rremote[0] = 0;
        rremote[1] = 0;
    } else {
        MPI_Abort(comm, 1);
    }
    rootdata = calloc(bs*nroots, sizeof(double));
    leafdata = rootdata;

    for ( i = 0; i < nroots*bs; i++ ) {
        rootdata[i] = rank + 1;
    }
    for ( i = 0; i < 2; i++ ) {
        ierr = MPI_Type_create_indexed_block(rcount[i], 1, rmine +
roffset[i],
                                             unit, &mine[i]);
CHKERRQ(ierr);
        ierr = MPI_Type_create_indexed_block(rcount[i], 1, rremote +
roffset[i],
                                             unit, &remote[i]);
CHKERRQ(ierr);
        ierr = MPI_Type_commit(&mine[i]); CHKERRQ(ierr);
        ierr = MPI_Type_commit(&remote[i]); CHKERRQ(ierr);
    }

    ierr = MPI_Type_get_extent(unit, &lb, &bytes); CHKERRQ(ierr);

    ierr = MPI_Win_create((void *)rootdata, (MPI_Aint)bytes*nroots,
                          bytes, MPI_INFO_NULL, comm, &win);
CHKERRQ(ierr);
    ierr = MPI_Win_fence(MPI_MODE_NOPUT|MPI_MODE_NOPRECEDE, win);
CHKERRQ(ierr);
    for (i = 0; i < nranks; i++ ) {
      ierr =
MPI_Get((void*)leafdata,1,mine[i],ranks[i],0,1,remote[i],win);
CHKERRQ(ierr);
    }
    ierr = MPI_Win_fence(MPI_MODE_NOSTORE|MPI_MODE_NOSUCCEED,win);
CHKERRQ(ierr);
    ierr = MPI_Win_free(&win); CHKERRQ(ierr);
    for ( i = 0; i < 2; i++ ) {
        ierr = MPI_Type_free(&mine[i]); CHKERRQ(ierr);
        ierr = MPI_Type_free(&remote[i]); CHKERRQ(ierr);
    }
    ierr = MPI_Type_free(&unit); CHKERRQ(ierr);

    if ( rank == 1 ) {
        ierr = MPI_Barrier(comm); CHKERRQ(ierr);
    }
    printf("[%d] ", rank);
    for ( i = 0; i < nroots*bs; i++ ) {
        printf("%g ", rootdata[i]);
    }
    printf("\n");
    fflush(stdout);
    if ( rank == 0 ) {
        ierr = MPI_Barrier(comm); CHKERRQ(ierr);
    }
    free(rootdata);
    MPI_Finalize();
    return 0;
}

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJVJREYAAoJECOc1kQ8PEYvVaMIAL9ah65G5xheouBJeb6J52Nm
kSVJ1Bu4Sop2VkabnfUftsSBx4JaxQIpUf4bdzzHdd3et4OqtM254QWifA8GpqfZ
FuA9HbLx7PG7DC83iXgMxEtkNTCuk6NLSfSrW5/ItOL08ApZ2MuITziN+4buD9Pr
/35VAKdWmdpT83XAiT+PARJ8RO9fgsutJ8kHGx3Na+xbR1/eh60FTTU5p17l1YP9
ZXgLSbl/9c8HUUXHM0oSNi7nuC2lGaNzmubuBEgW3ltNLTeVZUZYmmm4/QfvZjyB
cOsUboyVFHh64QdvaUAJxY1GVWlVb6dI10EPcp9yL9nvy9u+z0pQAc5AgjeT8Mc=
=myVY
-----END PGP SIGNATURE-----



More information about the petsc-dev mailing list