[petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

Satish Balay balay at mcs.anl.gov
Tue Apr 17 16:40:30 CDT 2018


Thanks! I tried the patch - and this testcase doesn't hang anymore..

Satish

On Tue, 17 Apr 2018, Min Si wrote:

> Hi all,
> 
> Thanks for narrowing down the problem. I checked the MPICH code and believe
> this is a bug in MPICH. I just created a PR to fix it:
> https://github.com/pmodels/mpich/pull/3097
> 
> It should be merged into MPICH master branch soon.
> 
> Thanks,
> Min
> 
> On 2018/04/17 14:10, Eric Chamberland wrote:
> > Hi,
> >
> > are we talking about the "tag" passed to MPI_Isend for example?
> >
> > but does that mean there is something to change for any MPI call which
> > involves tags usage or is it only a PETSc "bad" tag usage?
> >
> > thanks Satish for your finding!
> >
> > Eric
> >
> > On 16/04/18 11:31 PM, Satish Balay wrote:
> >> On Tue, 13 Mar 2018, Eric Chamberland wrote:
> >>
> >>> Hi,
> >>>
> >>> each night we are testing mpich/master with our petsc-based code.  I don't
> >>> know if PETSc team is doing the same thing with mpich/master?   (Maybe it
> >>> is a
> >>> good idea?)
> >>>
> >>> Everything was fine (except the issue
> >>> https://github.com/pmodels/mpich/issues/2892) up to commit 7b8d64debd, but
> >>> since commit mpich:a8a2b30fd21), I have a segfault on a any parallel
> >>> nightly
> >>> test.
> >>
> >> I attempted a bisect of the above range of commits - and narrowed down to:
> >>
> >>>>>>>>>
> >> db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
> >> <<<<<<<
> >>>>>>>>>>
> >> balay at asterix /home/balay/soft/build/mpich ((db11d4c4a...)|BISECTING)
> >> $ git show db11d4c4a70e39a28be88ed32f00542301699e08
> >> commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
> >> Author: Ken Raffenetti <raffenet at mcs.anl.gov>
> >> Date:   Thu Feb 15 11:37:59 2018 -0600
> >>
> >>      init: Fix tag upper limit initialization
> >>           The starting point for this value is equivalent to the usable tag
> >> bits
> >>      macro. This value should be set before device initialization,
> >>      otherwise devices will assume they have more bits than are actually
> >>      available.
> >>           Signed-off-by: Wesley Bland <wesley.bland at intel.com>
> >>
> >> diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
> >> index cbc41f4d5..b31ae2f07 100644
> >> --- a/src/mpi/init/initthread.c
> >> +++ b/src/mpi/init/initthread.c
> >> @@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, int
> >> required, int *provided)
> >>       MPIR_Process.attrs.host = MPI_PROC_NULL;
> >>       MPIR_Process.attrs.io = MPI_PROC_NULL;
> >>       MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
> >> -    MPIR_Process.attrs.tag_ub = 0;
> >> +    MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
> >>       MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
> >>       MPIR_Process.attrs.wtime_is_global = 0;
> >>   @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, int
> >> required, int *provided)
> >>       MPIR_Assert(((unsigned) MPIR_Process.
> >>                    attrs.tag_ub & ((unsigned) MPIR_Process.attrs.tag_ub +
> >> 1)) == 0);
> >>   -    /* Set aside tag space for tagged collectives and failure
> >> notification */
> >> -#ifdef HAVE_TAG_ERROR_BITS
> >> -    MPIR_Process.attrs.tag_ub >>= 3;
> >> -#else
> >> -    MPIR_Process.attrs.tag_ub >>= 1;
> >> -#endif
> >> -
> >>       /* Assert: tag_ub is at least the minimum asked for in the MPI spec
> >> */
> >>       MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
> >> <<<<<<<<<<<<<<<<<
> >>
> >> Reverthing this patch gets mpich-3.3b2 working with petsc
> >>
> >> Satish
> >>
> 
> 
> 


More information about the petsc-dev mailing list