<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hi Junchao,<br>
    <br>
    This is a great idea. We will add large tag tests in our test suite
    !<br>
    <br>
    Min<br>
    <br>
    <div class="moz-cite-prefix">On 2018/04/17 18:17, Junchao Zhang
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CA+MQGp8CMmZZtbT9ABU3GyyafXzVVOOptWFQPkRwhWsYJAPG=Q@mail.gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">Min,
        <div>  I suggest MPICH add tests to play with the maximal MPI
          tag (through attribute MPI_TAG_UB). </div>
        <div>  PETSc uses tags from the maximal and downwards. I guess
          MPICH tests use small tags. That is why the bug only showed up
          with PETSc.<br>
          <div class="gmail_extra"><br clear="all">
            <div>
              <div class="m_-2405801840177781509gmail_signature"
                data-smartmail="gmail_signature">
                <div dir="ltr">--Junchao Zhang</div>
              </div>
            </div>
            <br>
            <div class="gmail_quote">On Tue, Apr 17, 2018 at 3:58 PM,
              Min Si <span dir="ltr"><<a href="mailto:msi@anl.gov"
                  target="_blank" moz-do-not-send="true">msi@anl.gov</a>></span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi
                all,<br>
                <br>
                Thanks for narrowing down the problem. I checked the
                MPICH code and believe this is a bug in MPICH. I just
                created a PR to fix it:<br>
                <a href="https://github.com/pmodels/mpich/pull/3097"
                  rel="noreferrer" target="_blank"
                  moz-do-not-send="true">https://github.com/pmodels/mpi<wbr>ch/pull/3097</a><br>
                <br>
                It should be merged into MPICH master branch soon.<br>
                <br>
                Thanks,<br>
                Min
                <div class="m_-2405801840177781509HOEnZb">
                  <div class="m_-2405801840177781509h5"><br>
                    <br>
                    On 2018/04/17 14:10, Eric Chamberland wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      Hi,<br>
                      <br>
                      are we talking about the "tag" passed to MPI_Isend
                      for example?<br>
                      <br>
                      but does that mean there is something to change
                      for any MPI call which involves tags usage or is
                      it only a PETSc "bad" tag usage?<br>
                      <br>
                      thanks Satish for your finding!<br>
                      <br>
                      Eric<br>
                      <br>
                      On 16/04/18 11:31 PM, Satish Balay wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        On Tue, 13 Mar 2018, Eric Chamberland wrote:<br>
                        <br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          Hi,<br>
                          <br>
                          each night we are testing mpich/master with
                          our petsc-based code.  I don't<br>
                          know if PETSc team is doing the same thing
                          with mpich/master?   (Maybe it is a<br>
                          good idea?)<br>
                          <br>
                          Everything was fine (except the issue<br>
                          <a
                            href="https://github.com/pmodels/mpich/issues/2892"
                            rel="noreferrer" target="_blank"
                            moz-do-not-send="true">https://github.com/pmodels/mpi<wbr>ch/issues/2892</a>)
                          up to commit 7b8d64debd, but<br>
                          since commit mpich:a8a2b30fd21), I have a
                          segfault on a any parallel nightly<br>
                          test.<br>
                        </blockquote>
                        <br>
                        I attempted a bisect of the above range of
                        commits - and narrowed down to:<br>
                        <br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                <blockquote class="gmail_quote"
                                  style="margin:0 0 0
                                  .8ex;border-left:1px #ccc
                                  solid;padding-left:1ex">
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    <blockquote class="gmail_quote"
                                      style="margin:0 0 0
                                      .8ex;border-left:1px #ccc
                                      solid;padding-left:1ex">
                                      <br>
                                    </blockquote>
                                  </blockquote>
                                </blockquote>
                              </blockquote>
                            </blockquote>
                          </blockquote>
                        </blockquote>
                        db11d4c4a70e39a28be88ed32f0054<wbr>2301699e08 is
                        the first bad commit<br>
                        <<<<<<<<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                <blockquote class="gmail_quote"
                                  style="margin:0 0 0
                                  .8ex;border-left:1px #ccc
                                  solid;padding-left:1ex">
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    <blockquote class="gmail_quote"
                                      style="margin:0 0 0
                                      .8ex;border-left:1px #ccc
                                      solid;padding-left:1ex">
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex">
                                        <br>
                                      </blockquote>
                                    </blockquote>
                                  </blockquote>
                                </blockquote>
                              </blockquote>
                            </blockquote>
                          </blockquote>
                        </blockquote>
                        balay@asterix /home/balay/soft/build/mpich
                        ((db11d4c4a...)|BISECTING)<br>
                        $ git show db11d4c4a70e39a28be88ed32f0054<wbr>2301699e08<br>
                        commit db11d4c4a70e39a28be88ed32f0054<wbr>2301699e08
                        (HEAD, refs/bisect/bad)<br>
                        Author: Ken Raffenetti <<a
                          href="mailto:raffenet@mcs.anl.gov"
                          target="_blank" moz-do-not-send="true">raffenet@mcs.anl.gov</a>><br>
                        Date:   Thu Feb 15 11:37:59 2018 -0600<br>
                        <br>
                             init: Fix tag upper limit initialization<br>
                                  The starting point for this value is
                        equivalent to the usable tag bits<br>
                             macro. This value should be set before
                        device initialization,<br>
                             otherwise devices will assume they have
                        more bits than are actually<br>
                             available.<br>
                                  Signed-off-by: Wesley Bland <<a
                          href="mailto:wesley.bland@intel.com"
                          target="_blank" moz-do-not-send="true">wesley.bland@intel.com</a>><br>
                        <br>
                        diff --git a/src/mpi/init/initthread.c
                        b/src/mpi/init/initthread.c<br>
                        index cbc41f4d5..b31ae2f07 100644<br>
                        --- a/src/mpi/init/initthread.c<br>
                        +++ b/src/mpi/init/initthread.c<br>
                        @@ -403,7 +403,7 @@ int MPIR_Init_thread(int
                        *argc, char ***argv, int required, int
                        *provided)<br>
                              MPIR_Process.attrs.host = MPI_PROC_NULL;<br>
                              <a href="http://MPIR_Process.attrs.io"
                          rel="noreferrer" target="_blank"
                          moz-do-not-send="true">MPIR_Process.attrs.io</a>
                        = MPI_PROC_NULL;<br>
                              MPIR_Process.attrs.lastusedcod<wbr>e =
                        MPI_ERR_LASTCODE;<br>
                        -    MPIR_Process.attrs.tag_ub = 0;<br>
                        +    MPIR_Process.attrs.tag_ub =
                        MPIR_TAG_USABLE_BITS;<br>
                              MPIR_Process.attrs.universe =
                        MPIR_UNIVERSE_SIZE_NOT_SET;<br>
                              MPIR_Process.attrs.wtime_is_gl<wbr>obal =
                        0;<br>
                          @@ -531,13 +531,6 @@ int MPIR_Init_thread(int
                        *argc, char ***argv, int required, int
                        *provided)<br>
                              MPIR_Assert(((unsigned) MPIR_Process.<br>
                                           attrs.tag_ub &
                        ((unsigned) MPIR_Process.attrs.tag_ub + 1)) ==
                        0);<br>
                          -    /* Set aside tag space for tagged
                        collectives and failure notification */<br>
                        -#ifdef HAVE_TAG_ERROR_BITS<br>
                        -    MPIR_Process.attrs.tag_ub >>= 3;<br>
                        -#else<br>
                        -    MPIR_Process.attrs.tag_ub >>= 1;<br>
                        -#endif<br>
                        -<br>
                              /* Assert: tag_ub is at least the minimum
                        asked for in the MPI spec */<br>
                              MPIR_Assert(MPIR_Process.attrs<wbr>.tag_ub
                        >= 32767);<br>
<<<<<<<<<<<<<<<<<<br>
                        <br>
                        Reverthing this patch gets mpich-3.3b2 working
                        with petsc<br>
                        <br>
                        Satish<br>
                        <br>
                      </blockquote>
                    </blockquote>
                    <br>
                  </div>
                </div>
              </blockquote>
            </div>
            <br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>