[petsc-dev] Strange failure in PetscSF when using Open MPI 1.10.2 in OS X Travis-CI

Lisandro Dalcin dalcinl at gmail.com
Thu Jun 16 17:51:27 CDT 2016


Could any of you guys make a PR with these fixes and ping back to me?

On 16 June 2016 at 13:16, Lawrence Mitchell
<lawrence.mitchell at imperial.ac.uk> wrote:
>
>
> On 15/06/16 19:59, Lisandro Dalcin wrote:
>> This is the failing build:
>> https://travis-ci.org/petsc/petsc/jobs/137818148
>>
>> A similar build with MPICH does not generate this error:
>> https://travis-ci.org/petsc/petsc/jobs/137818145
>>
>> Maybe Open MPI bug?
>
> I don't think so.  I can reproduce this on ubuntu/16.04 with openmpi
> 1.10.2.  The problem is, I think, as follows:
>
> When sfbasic sets up a pack in PetscSFBasicPackTypeSetup it does a
> bunch of comparisons for the datatype.  In this case, the unit
> datatype is an MPI_Type_contiguous(4, MPIU_COMPLEX).
>
> So the check
>
>   ierr =
> MPIPetsc_Type_compare_contig(unit,MPIU_COMPLEX,&nPetscComplexContig);CHKERRQ(ierr);
>
> should return true in nPetscComplexContig.
>
> But it doesn't.  Why?
>
> MPIPetsc_Type_compare_contig unwraps the passed in types.  Neither was
> dupped so we pull apart unit:
>
> MPI_Type_get_envelope(unit, ...)
>
> The combiner is contiguous, great.  So now we do:
>
> MPI_Type_get_contents(unit, ...)
>
> This returns one datatype that "is equivalent to the datatype used
> when creating unit".  It is only *equal* to the datatype used if
> MPIU_COMPLEX is a predefined data type.  But if PETSC_CLANGUAGE_CXX is
> defined, then MPIU_COMPLEX is *not* a predefined datatype, afaict.
>
> So now the check:
>
>     if (atypes[0] == btype) *n = aints[0];
>
> fails, and we don't determine that the type is contiguous, and so we
> fall through to the "generic" code around line 640 in sfbasic.c.  The
> sizeof(int) is 4 and the number of bytes in the type is 16*4, so this
> is not handled.  Hence the error.
>
> I think the correct fix for this is to use MPIPetsc_Type_compare to
> compare atypes[0] and btype, rather than expected object identity.
>
> We then run into a further error in PetscSFBasicGetPackInUse, because
> we call:
>
> MPIPetsc_Type_compare(unit, link->unit)
>
> where link->unit is created from MPI_Type_dup(unit, &link->unit)
>
> So we'll unwrap link->unit and return the type that
> MPI_Type_get_contents returns.  But now we run into the same problem
> that this just "looks the same" as unit, and isn't the same.
>
> There's a comment in MPIPetsc_Type_compare that the internal
> comparison should be recursive.  With that addition as well, the ex3
> tests pass again.
>
> Tentative patch to fix this problem attached.
>
> Cheers,
>
> Lawrence



-- 
Lisandro Dalcin
============
Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459



More information about the petsc-dev mailing list