[petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

Junchao Zhang jczhang at mcs.anl.gov
Tue Apr 17 18:17:53 CDT 2018


Min,
  I suggest MPICH add tests to play with the maximal MPI tag (through
attribute MPI_TAG_UB).
  PETSc uses tags from the maximal and downwards. I guess MPICH tests use
small tags. That is why the bug only showed up with PETSc.

--Junchao Zhang

On Tue, Apr 17, 2018 at 3:58 PM, Min Si <msi at anl.gov> wrote:

> Hi all,
>
> Thanks for narrowing down the problem. I checked the MPICH code and
> believe this is a bug in MPICH. I just created a PR to fix it:
> https://github.com/pmodels/mpich/pull/3097
>
> It should be merged into MPICH master branch soon.
>
> Thanks,
> Min
>
>
> On 2018/04/17 14:10, Eric Chamberland wrote:
>
>> Hi,
>>
>> are we talking about the "tag" passed to MPI_Isend for example?
>>
>> but does that mean there is something to change for any MPI call which
>> involves tags usage or is it only a PETSc "bad" tag usage?
>>
>> thanks Satish for your finding!
>>
>> Eric
>>
>> On 16/04/18 11:31 PM, Satish Balay wrote:
>>
>>> On Tue, 13 Mar 2018, Eric Chamberland wrote:
>>>
>>> Hi,
>>>>
>>>> each night we are testing mpich/master with our petsc-based code.  I
>>>> don't
>>>> know if PETSc team is doing the same thing with mpich/master?   (Maybe
>>>> it is a
>>>> good idea?)
>>>>
>>>> Everything was fine (except the issue
>>>> https://github.com/pmodels/mpich/issues/2892) up to commit 7b8d64debd,
>>>> but
>>>> since commit mpich:a8a2b30fd21), I have a segfault on a any parallel
>>>> nightly
>>>> test.
>>>>
>>>
>>> I attempted a bisect of the above range of commits - and narrowed down
>>> to:
>>>
>>>
>>>>>>>>>> db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
>>> <<<<<<<
>>>
>>>>
>>>>>>>>>>> balay at asterix /home/balay/soft/build/mpich
>>> ((db11d4c4a...)|BISECTING)
>>> $ git show db11d4c4a70e39a28be88ed32f00542301699e08
>>> commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
>>> Author: Ken Raffenetti <raffenet at mcs.anl.gov>
>>> Date:   Thu Feb 15 11:37:59 2018 -0600
>>>
>>>      init: Fix tag upper limit initialization
>>>           The starting point for this value is equivalent to the usable
>>> tag bits
>>>      macro. This value should be set before device initialization,
>>>      otherwise devices will assume they have more bits than are actually
>>>      available.
>>>           Signed-off-by: Wesley Bland <wesley.bland at intel.com>
>>>
>>> diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
>>> index cbc41f4d5..b31ae2f07 100644
>>> --- a/src/mpi/init/initthread.c
>>> +++ b/src/mpi/init/initthread.c
>>> @@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, int
>>> required, int *provided)
>>>       MPIR_Process.attrs.host = MPI_PROC_NULL;
>>>       MPIR_Process.attrs.io = MPI_PROC_NULL;
>>>       MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
>>> -    MPIR_Process.attrs.tag_ub = 0;
>>> +    MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
>>>       MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
>>>       MPIR_Process.attrs.wtime_is_global = 0;
>>>   @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, int
>>> required, int *provided)
>>>       MPIR_Assert(((unsigned) MPIR_Process.
>>>                    attrs.tag_ub & ((unsigned) MPIR_Process.attrs.tag_ub
>>> + 1)) == 0);
>>>   -    /* Set aside tag space for tagged collectives and failure
>>> notification */
>>> -#ifdef HAVE_TAG_ERROR_BITS
>>> -    MPIR_Process.attrs.tag_ub >>= 3;
>>> -#else
>>> -    MPIR_Process.attrs.tag_ub >>= 1;
>>> -#endif
>>> -
>>>       /* Assert: tag_ub is at least the minimum asked for in the MPI
>>> spec */
>>>       MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
>>> <<<<<<<<<<<<<<<<<
>>>
>>> Reverthing this patch gets mpich-3.3b2 working with petsc
>>>
>>> Satish
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180417/39c79e3d/attachment-0001.html>


More information about the petsc-dev mailing list