[petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

Min Si msi at anl.gov
Thu Apr 19 09:51:34 CDT 2018


Hi Junchao,

This is a great idea. We will add large tag tests in our test suite !

Min

On 2018/04/17 18:17, Junchao Zhang wrote:
> Min,
>   I suggest MPICH add tests to play with the maximal MPI tag (through 
> attribute MPI_TAG_UB).
>   PETSc uses tags from the maximal and downwards. I guess MPICH tests 
> use small tags. That is why the bug only showed up with PETSc.
>
> --Junchao Zhang
>
> On Tue, Apr 17, 2018 at 3:58 PM, Min Si <msi at anl.gov 
> <mailto:msi at anl.gov>> wrote:
>
>     Hi all,
>
>     Thanks for narrowing down the problem. I checked the MPICH code
>     and believe this is a bug in MPICH. I just created a PR to fix it:
>     https://github.com/pmodels/mpich/pull/3097
>     <https://github.com/pmodels/mpich/pull/3097>
>
>     It should be merged into MPICH master branch soon.
>
>     Thanks,
>     Min
>
>
>     On 2018/04/17 14:10, Eric Chamberland wrote:
>
>         Hi,
>
>         are we talking about the "tag" passed to MPI_Isend for example?
>
>         but does that mean there is something to change for any MPI
>         call which involves tags usage or is it only a PETSc "bad" tag
>         usage?
>
>         thanks Satish for your finding!
>
>         Eric
>
>         On 16/04/18 11:31 PM, Satish Balay wrote:
>
>             On Tue, 13 Mar 2018, Eric Chamberland wrote:
>
>                 Hi,
>
>                 each night we are testing mpich/master with our
>                 petsc-based code.  I don't
>                 know if PETSc team is doing the same thing with
>                 mpich/master?   (Maybe it is a
>                 good idea?)
>
>                 Everything was fine (except the issue
>                 https://github.com/pmodels/mpich/issues/2892
>                 <https://github.com/pmodels/mpich/issues/2892>) up to
>                 commit 7b8d64debd, but
>                 since commit mpich:a8a2b30fd21), I have a segfault on
>                 a any parallel nightly
>                 test.
>
>
>             I attempted a bisect of the above range of commits - and
>             narrowed down to:
>
>
>             db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad
>             commit
>             <<<<<<<
>
>
>             balay at asterix /home/balay/soft/build/mpich
>             ((db11d4c4a...)|BISECTING)
>             $ git show db11d4c4a70e39a28be88ed32f00542301699e08
>             commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD,
>             refs/bisect/bad)
>             Author: Ken Raffenetti <raffenet at mcs.anl.gov
>             <mailto:raffenet at mcs.anl.gov>>
>             Date:   Thu Feb 15 11:37:59 2018 -0600
>
>                  init: Fix tag upper limit initialization
>                       The starting point for this value is equivalent
>             to the usable tag bits
>                  macro. This value should be set before device
>             initialization,
>                  otherwise devices will assume they have more bits
>             than are actually
>                  available.
>                       Signed-off-by: Wesley Bland
>             <wesley.bland at intel.com <mailto:wesley.bland at intel.com>>
>
>             diff --git a/src/mpi/init/initthread.c
>             b/src/mpi/init/initthread.c
>             index cbc41f4d5..b31ae2f07 100644
>             --- a/src/mpi/init/initthread.c
>             +++ b/src/mpi/init/initthread.c
>             @@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char
>             ***argv, int required, int *provided)
>                   MPIR_Process.attrs.host = MPI_PROC_NULL;
>             MPIR_Process.attrs.io <http://MPIR_Process.attrs.io> =
>             MPI_PROC_NULL;
>                   MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
>             -    MPIR_Process.attrs.tag_ub = 0;
>             +    MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
>                   MPIR_Process.attrs.universe =
>             MPIR_UNIVERSE_SIZE_NOT_SET;
>                   MPIR_Process.attrs.wtime_is_global = 0;
>               @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc,
>             char ***argv, int required, int *provided)
>                   MPIR_Assert(((unsigned) MPIR_Process.
>                                attrs.tag_ub & ((unsigned)
>             MPIR_Process.attrs.tag_ub + 1)) == 0);
>               -    /* Set aside tag space for tagged collectives and
>             failure notification */
>             -#ifdef HAVE_TAG_ERROR_BITS
>             -    MPIR_Process.attrs.tag_ub >>= 3;
>             -#else
>             -    MPIR_Process.attrs.tag_ub >>= 1;
>             -#endif
>             -
>                   /* Assert: tag_ub is at least the minimum asked for
>             in the MPI spec */
>                   MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
>             <<<<<<<<<<<<<<<<<
>
>             Reverthing this patch gets mpich-3.3b2 working with petsc
>
>             Satish
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180419/9f2e2e37/attachment-0001.html>


More information about the petsc-dev mailing list