[petsc-dev] Strange failure in PetscSF when using Open MPI 1.10.2 in OS X Travis-CI
Lisandro Dalcin
dalcinl at gmail.com
Thu Jun 16 17:51:27 CDT 2016
Could any of you guys make a PR with these fixes and ping back to me?
On 16 June 2016 at 13:16, Lawrence Mitchell
<lawrence.mitchell at imperial.ac.uk> wrote:
>
>
> On 15/06/16 19:59, Lisandro Dalcin wrote:
>> This is the failing build:
>> https://travis-ci.org/petsc/petsc/jobs/137818148
>>
>> A similar build with MPICH does not generate this error:
>> https://travis-ci.org/petsc/petsc/jobs/137818145
>>
>> Maybe Open MPI bug?
>
> I don't think so. I can reproduce this on ubuntu/16.04 with openmpi
> 1.10.2. The problem is, I think, as follows:
>
> When sfbasic sets up a pack in PetscSFBasicPackTypeSetup it does a
> bunch of comparisons for the datatype. In this case, the unit
> datatype is an MPI_Type_contiguous(4, MPIU_COMPLEX).
>
> So the check
>
> ierr =
> MPIPetsc_Type_compare_contig(unit,MPIU_COMPLEX,&nPetscComplexContig);CHKERRQ(ierr);
>
> should return true in nPetscComplexContig.
>
> But it doesn't. Why?
>
> MPIPetsc_Type_compare_contig unwraps the passed in types. Neither was
> dupped so we pull apart unit:
>
> MPI_Type_get_envelope(unit, ...)
>
> The combiner is contiguous, great. So now we do:
>
> MPI_Type_get_contents(unit, ...)
>
> This returns one datatype that "is equivalent to the datatype used
> when creating unit". It is only *equal* to the datatype used if
> MPIU_COMPLEX is a predefined data type. But if PETSC_CLANGUAGE_CXX is
> defined, then MPIU_COMPLEX is *not* a predefined datatype, afaict.
>
> So now the check:
>
> if (atypes[0] == btype) *n = aints[0];
>
> fails, and we don't determine that the type is contiguous, and so we
> fall through to the "generic" code around line 640 in sfbasic.c. The
> sizeof(int) is 4 and the number of bytes in the type is 16*4, so this
> is not handled. Hence the error.
>
> I think the correct fix for this is to use MPIPetsc_Type_compare to
> compare atypes[0] and btype, rather than expected object identity.
>
> We then run into a further error in PetscSFBasicGetPackInUse, because
> we call:
>
> MPIPetsc_Type_compare(unit, link->unit)
>
> where link->unit is created from MPI_Type_dup(unit, &link->unit)
>
> So we'll unwrap link->unit and return the type that
> MPI_Type_get_contents returns. But now we run into the same problem
> that this just "looks the same" as unit, and isn't the same.
>
> There's a comment in MPIPetsc_Type_compare that the internal
> comparison should be recursive. With that addition as well, the ex3
> tests pass again.
>
> Tentative patch to fix this problem attached.
>
> Cheers,
>
> Lawrence
--
Lisandro Dalcin
============
Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa
Office Phone: +966 12 808-0459
More information about the petsc-dev
mailing list