<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Mar 30, 2021, at 10:18 PM, Eric Chamberland <<a href="mailto:Eric.Chamberland@giref.ulaval.ca" class="">Eric.Chamberland@giref.ulaval.ca</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
  
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
  
  <div class=""><p class="">Hi Barry,</p><p class="">Here is what I have:</p><p class="">1. The hpddm issues have been all solved (you can't see no more
      hpddm failures here:
<a class="moz-txt-link-freetext" href="https://giref.ulaval.ca/~cmpgiref/petsc-main-debug/2021.03.29.02h00m02s_make_test.log">https://giref.ulaval.ca/~cmpgiref/petsc-main-debug/2021.03.29.02h00m02s_make_test.log</a>)</p><div class=""><br class=""></div></div></div></blockquote>  Great.</div><div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><p class="">2. For Hypre, I think it is indeed not a bug but a feature, as
      far as I can see what has been told on the hypre discussion</p><p class=""> list it is said "<span style="color: rgb(36, 41, 46); font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;" class=""> It still depends on the number of threads, that can’t be avoided</span>"
      (
      <a class="moz-txt-link-freetext" href="https://github.com/hypre-space/hypre/issues/303#issuecomment-800442755">https://github.com/hypre-space/hypre/issues/303#issuecomment-800442755</a>
      ) <br class=""></p></div></div></blockquote><div><br class=""></div>  This is nonsense, they know better. Sure the convergence "decays", but no longer producing a positive definite preconditioner when the problem is positive definite is not due to "convergence decaying" it is much more fundamental; they are all good numerical analysts, they know this. They are basically saying that if you start with a positive definite problem which supports use of CG but use OpenMP threading then you need to switch to GMRES. That is a high price to pay; I suspect there is a bug in the code or that it is just not designed correctly but they don't want to deal with hunting down the issue. </div><div><br class=""></div><div>  The point is even if the smoother does absolutely nothing to improve the solution (it just copies the current value) it will not make the preconditioner operator no longer positive definite. So my conclusion is the smoother is broken since it does worse than nothing.  </div><div><br class=""></div><div>  Do they propose a solution? Just not use  OpenMP threading for positive definite problems or always use GMRES when using OpenMP?</div><div><br class=""></div><div>  I am not sure what to with your (and PETSc's) test cases in this situation. I guess the PETSc test could switch to GMRES when hypre is using OpenMP with a number of threads great than 1. But kind of cumbersome and annoying. </div><div><br class=""></div><div>  Junchao and Scott have some ideas on adding OpenMP threading to our CI tests. If we make sure this particular problem is in their then we will need to add a switch to handle it.</div><div><br class=""></div><div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><p class="">
    </p><p class="">and here
<a class="moz-txt-link-freetext" href="https://www.researchgate.net/publication/220411740_Multigrid_Smoothers_for_Ultraparallel_Computing">https://www.researchgate.net/publication/220411740_Multigrid_Smoothers_for_Ultraparallel_Computing</a>,
      into section 7.3, we have some interesting informations, as:<br class="">
    </p><p class="">Figure 7.6
      clearly illustrates that convergence degrades with the addition of
      threads for hybrid
      SGS; <br class="">
    </p><p class="">.... <br class="">
    </p><p class="">The 3D sphere problem
      is the most extreme example because AMG-CG with hybrid SGS no
      longer converges
      with the addition of threading.</p><p class="">but I might have misunderstood since I am not an expert for
      that...<br class="">
    </p><p class="">3. For SuperLU_Dist, I have tried to build SuperLU_dist out of
      PETSc to run the tests from superlu itself: sadly the bug is not
      showing up (see
      <a class="moz-txt-link-freetext" href="https://github.com/xiaoyeli/superlu_dist/issues/69">https://github.com/xiaoyeli/superlu_dist/issues/69</a>).  <br class="">
    </p><p class="">I would like to build a reproducer superlu_dist example from what
      is done in the faulty test:<br class="">
    </p>
    <pre style="font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; overflow-wrap: break-word; white-space: pre-wrap;" class="">ksp_ksp_tutorials-ex5</pre><p class=""> that is buggy when called from PETSc: what bugs me, is that many
      other PETSc tests are running fine with superlu_dist: maybe
      something is uniquely done in ksp_ksp_tutorials-ex5 ?</p><p class="">So I think it worth digging into #3: the simple thing I have not
      yet done is retreiving the stack when it fails (timeout).<br class=""></p></div></div></blockquote><div><br class=""></div>  I wish I had infinite time to fix these things. One could run it for a while until it "hangs" and then attach a debugger to the hanging process to see where it is. This would help determine the problem.</div><div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><p class="">
    </p><p class="">And a question: when you state that you upgraded to OpenMPI 4.1
      you mean for one of your automated (docker?) compilation into the
      gitlab pipelines?</p><div class=""><br class=""></div></div></div></blockquote>  Both for our testing and for our --download-openmpi configure option. I do not know if this is related to the problem at hand or not.</div><div><br class=""></div><div>Barry</div><div><br class=""></div><div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><p class="">Thanks for taking news! :)</p><p class="">Eric</p><p class=""><br class="">
    </p>
    <div class="moz-cite-prefix">On 2021-03-30 1:47 p.m., Barry Smith
      wrote:<br class="">
    </div>
    <blockquote type="cite" cite="mid:29A3F88F-0132-4EC9-B96C-D3887B8AF27D@petsc.dev" class="">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
      <div class=""><br class="">
      </div>
        Eric,
      <div class=""><br class="">
      </div>
      <div class="">    How are things going on this OpenMP  front? Any
        bug fixes from hypre or SuperLU_DIST?</div>
      <div class=""><br class="">
      </div>
      <div class="">    BTW: we have upgraded to OpenMPI 4.1 perhaps
        this resolves some issues?</div>
      <div class=""><br class="">
      </div>
      <div class="">   Barry</div>
      <div class=""><br class="">
        <div class=""><br class="">
          <blockquote type="cite" class="">
            <div class="">On Mar 22, 2021, at 2:07 PM, Eric Chamberland
              <<a href="mailto:Eric.Chamberland@giref.ulaval.ca" class="" moz-do-not-send="true">Eric.Chamberland@giref.ulaval.ca</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <meta http-equiv="Content-Type" content="text/html;
                charset=UTF-8" class="">
              <div class=""><p class="">I added some information here:</p><p class=""><a class="moz-txt-link-freetext" href="https://github.com/xiaoyeli/superlu_dist/issues/69#issuecomment-804318719" moz-do-not-send="true">https://github.com/xiaoyeli/superlu_dist/issues/69#issuecomment-804318719</a></p><p class="">Maybe someone can say more than I on what
                  PETSc tries to do with the 2 mentioned tutorials that
                  are timing out...</p><p class="">Thanks,</p><p class="">Eric</p><p class=""><br class="">
                </p>
                <div class="moz-cite-prefix">On 2021-03-15 11:31 a.m.,
                  Eric Chamberland wrote:<br class="">
                </div>
                <blockquote type="cite" cite="mid:b45da415-70fc-84d1-0b41-2952ef9caba7@giref.ulaval.ca" class="">
                  <meta http-equiv="Content-Type" content="text/html;
                    charset=UTF-8" class=""><p class="">Reported timeout bugs to SuperLU_dist too:</p><p class=""><a class="moz-txt-link-freetext" href="https://github.com/xiaoyeli/superlu_dist/issues/69" moz-do-not-send="true">https://github.com/xiaoyeli/superlu_dist/issues/69</a></p><p class="">Eric</p><p class=""><br class="">
                  </p>
                  <div class="moz-cite-prefix">On 2021-03-14 2:18 p.m.,
                    Eric Chamberland wrote:<br class="">
                  </div>
                  <blockquote type="cite" cite="mid:92479045-ae7c-2f14-869e-c28073d7d4e4@giref.ulaval.ca" class="">
                    <meta http-equiv="Content-Type" content="text/html;
                      charset=UTF-8" class=""><p class="">Done:<br class="">
                    </p><p class=""><a class="moz-txt-link-freetext" href="https://github.com/hypre-space/hypre/issues/303" moz-do-not-send="true">https://github.com/hypre-space/hypre/issues/303</a></p><p class="">Maybe I will need some help about PETSc
                      to answer their questions...</p><p class="">Eric<br class="">
                    </p>
                    <div class="moz-cite-prefix">On 2021-03-14 3:44
                      a.m., Stefano Zampini wrote:<br class="">
                    </div>
                    <blockquote type="cite" cite="mid:5324E970-3CAF-4980-8387-31AA0E208087@gmail.com" class="">
                      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
                      Eric
                      <div class=""><br class="">
                      </div>
                      <div class="">You should report these HYPRE issues
                        upstream <a href="https://github.com/hypre-space/hypre/issues" class="" moz-do-not-send="true">https://github.com/hypre-space/hypre/issues</a></div>
                      <div class=""><br class="">
                        <div class=""><br class="">
                          <blockquote type="cite" class="">
                            <div class="">On Mar 14, 2021, at 3:44 AM,
                              Eric Chamberland <<a href="mailto:Eric.Chamberland@giref.ulaval.ca" class="" moz-do-not-send="true">Eric.Chamberland@giref.ulaval.ca</a>>
                              wrote:</div>
                            <br class="Apple-interchange-newline">
                            <div class="">
                              <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
                              <div class=""><p class="">For us it clearly creates
                                  problems in real computations...<br class="">
                                </p><p class="">I understand the need to
                                  have clean test for PETSc, but for me,
                                  it reveals that hypre isn't usable
                                  with more than one thread for now...</p><p class="">Another solution:  force
                                  single-threaded configuration for
                                  hypre until this is fixed?</p><p class="">Eric<br class="">
                                </p>
                                On 2021-03-13 8:50 a.m., Pierre Jolivet
                                wrote:<br class="">
                                <blockquote type="cite" cite="mid:7932C4DD-11BB-4BBB-A171-08585D7ADADE@joliv.et" class="">
                                  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
                                  <div class="">-pc_hypre_boomeramg_relax_type_all
                                    Jacobi => </div>
                                  <div class="">  Linear solve did not
                                    converge due to
                                    DIVERGED_INDEFINITE_PC iterations 3</div>
                                  <div class="">
                                    <div class="">-pc_hypre_boomeramg_relax_type_all
                                      l1scaled-Jacobi => </div>
                                  </div>
                                  <div class="">OK, independently of the
                                    architecture it seems (Eric Docker
                                    image with 1 or 2 threads or my
                                    macOS), but contraction factor is
                                    higher</div>
                                  <div class="">  Linear solve converged
                                    due to CONVERGED_RTOL iterations 8</div>
                                  <div class="">  Linear solve converged
                                    due to CONVERGED_RTOL iterations 24</div>
                                  <div class="">  Linear solve converged
                                    due to CONVERGED_RTOL iterations 26</div>
                                  <div class="">v. currently</div>
                                  <div class="">  Linear solve converged
                                    due to CONVERGED_RTOL iterations 7</div>
                                  <div class="">  Linear solve converged
                                    due to CONVERGED_RTOL iterations 9</div>
                                  <div class="">  Linear solve converged
                                    due to CONVERGED_RTOL iterations 10</div>
                                  <div class=""><br class="">
                                  </div>
                                  <div class="">Do we change this? Or
                                    should we force OMP_NUM_THREADS=1
                                    for make test?</div>
                                  <div class=""><br class="">
                                  </div>
                                  <div class="">Thanks,</div>
                                  <div class="">Pierre</div>
                                  <div class=""><br class="">
                                  </div>
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">On 13 Mar 2021, at
                                        2:26 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" class="" moz-do-not-send="true">mfadams@lbl.gov</a>>
                                        wrote:</div>
                                      <br class="Apple-interchange-newline">
                                      <div class="">
                                        <div dir="ltr" class="">Hypre
                                          uses a multiplicative smoother
                                          by default. It has a chebyshev
                                          smoother. That with a Jacobi
                                          PC should be thread invariant.
                                          <div class="">
                                            <div class="">Mark</div>
                                          </div>
                                        </div>
                                        <br class="">
                                        <div class="gmail_quote">
                                          <div dir="ltr" class="gmail_attr">On Sat,
                                            Mar 13, 2021 at 8:18 AM
                                            Pierre Jolivet <<a href="mailto:pierre@joliv.et" class="" moz-do-not-send="true">pierre@joliv.et</a>>
                                            wrote:<br class="">
                                          </div>
                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px
                                            0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
                                            <div style="overflow-wrap:
                                              break-word;" class=""><br class="">
                                              <div class="">
                                                <blockquote type="cite" class="">
                                                  <div class="">On 13
                                                    Mar 2021, at 9:17
                                                    AM, Pierre Jolivet
                                                    <<a href="mailto:pierre@joliv.et" target="_blank" class="" moz-do-not-send="true">pierre@joliv.et</a>>
                                                    wrote:</div>
                                                  <br class="">
                                                  <div class="">
                                                    <div style="overflow-wrap:
                                                      break-word;" class="">Hello
                                                      Eric,
                                                      <div class="">I’ve
                                                        made an
                                                        “interesting”
                                                        discovery, so
                                                        I’ll put back
                                                        the list in c/c.</div>
                                                      <div class="">It
                                                        appears the
                                                        following
                                                        snippet of code
                                                        which uses
                                                        Allreduce() +
                                                        lambda function
                                                        + MPI_IN_PLACE
                                                        is:</div>
                                                      <div class="">-
                                                        Valgrind-clean
                                                        with MPICH;</div>
                                                      <div class="">-
                                                        Valgrind-clean
                                                        with OpenMPI
                                                        4.0.5;</div>
                                                      <div class="">-
                                                        not
                                                        Valgrind-clean
                                                        with OpenMPI
                                                        4.1.0.</div>
                                                      <div class="">I’m
                                                        not sure who is
                                                        to blame here,
                                                        I’ll need to
                                                        look at the MPI
                                                        specification
                                                        for what is
                                                        required by the
                                                        implementors and
                                                        users in that
                                                        case.</div>
                                                      <div class=""><br class="">
                                                      </div>
                                                      <div class="">In
                                                        the meantime,
                                                        I’ll do the
                                                        following:</div>
                                                      <div class="">-
                                                        update config/BuildSystem/config/packages/OpenMPI.py
                                                        to use OpenMPI
                                                        4.1.0, see if
                                                        any other error
                                                        appears;</div>
                                                      <div class="">-
                                                        provide a hotfix
                                                        to bypass the
                                                        segfaults;</div>
                                                    </div>
                                                  </div>
                                                </blockquote>
                                                <div class=""><br class="">
                                                </div>
                                                <div class="">I can
                                                  confirm that splitting
                                                  the single Allreduce
                                                  with my own MPI_Op
                                                  into two Allreduce
                                                  with MAX and BAND
                                                  fixes the segfaults
                                                  with OpenMPI (*).</div>
                                                <br class="">
                                                <blockquote type="cite" class="">
                                                  <div class="">
                                                    <div style="overflow-wrap:
                                                      break-word;" class="">
                                                      <div class="">-
                                                        look at the
                                                        hypre issue and
                                                        whether they
                                                        should be
                                                        deferred to the
                                                        hypre team.</div>
                                                    </div>
                                                  </div>
                                                </blockquote>
                                                <div class=""><br class="">
                                                </div>
                                                <div class="">I don’t
                                                  know if there is
                                                  something wrong in
                                                  hypre threading or if
                                                  it’s just a side
                                                  effect of threading,
                                                  but it seems that the
                                                  number of threads has
                                                  a drastic effect on
                                                  the quality of the PC.</div>
                                                <div class="">By
                                                  default, it looks that
                                                  there are two threads
                                                  per process with your
                                                  Docker image.</div>
                                                <div class="">If I force
                                                  OMP_NUM_THREADS=1,
                                                  then I get the same
                                                  convergence as in the
                                                  output file.</div>
                                                <div class=""><br class="">
                                                </div>
                                                <div class="">Thanks,</div>
                                                <div class="">Pierre</div>
                                                <div class=""><br class="">
                                                </div>
                                                <div class="">(*) <a href="https://gitlab.com/petsc/petsc/-/merge_requests/3712" target="_blank" class="" moz-do-not-send="true">https://gitlab.com/petsc/petsc/-/merge_requests/3712</a></div>
                                                <br class="">
                                                <blockquote type="cite" class="">
                                                  <div class="">
                                                    <div style="overflow-wrap:
                                                      break-word;" class="">
                                                      <div class="">Thank
                                                        you for the
                                                        Docker files,
                                                        they were really
                                                        useful.</div>
                                                      <div class="">If
                                                        you want to
                                                        avoid
                                                        oversubscription
                                                        failures, you
                                                        can edit the
                                                        file
                                                        /opt/openmpi-4.1.0/etc/openmpi-default-hostfile
                                                        and append the
                                                        line:</div>
                                                      <div class="">localhost
                                                        slots=12</div>
                                                      <div class="">If
                                                        you want to
                                                        increase the
                                                        timeout limit of
                                                        PETSc test suite
                                                        for each test,
                                                        you can add the
                                                        extra flag in
                                                        your command
                                                        line TIMEOUT=180
                                                        (default is 60,
                                                        units are
                                                        seconds).</div>
                                                      <div class=""><br class="">
                                                      </div>
                                                      <div class="">Thanks,
                                                        I’ll ping you on
                                                        GitLab when I’ve
                                                        got something
                                                        ready for you to
                                                        try,</div>
                                                      <div class="">Pierre<br class="">
                                                        <div class=""><br class="">
                                                        </div>
                                                      </div>
                                                    </div>
                                                    <span id="gmail-m_3567963440499379521cid:15B6BE6E-0C96-4CBA-9ADC-EFB1DE1BDFC3" class=""><ompi.cxx></span>
                                                    <div style="overflow-wrap:
                                                      break-word;" class="">
                                                      <div class="">
                                                        <div class=""><br class="">
                                                          <blockquote type="cite" class="">
                                                          <div class="">On
                                                          12 Mar 2021,
                                                          at 8:54 PM,
                                                          Eric
                                                          Chamberland
                                                          <<a href="mailto:Eric.Chamberland@giref.ulaval.ca" target="_blank" class="" moz-do-not-send="true">Eric.Chamberland@giref.ulaval.ca</a>>
                                                          wrote:</div>
                                                          <br class="">
                                                          <div class="">
                                                          <div class=""><p class="">Hi
                                                          Pierre,</p><p class="">I
                                                          now have a
                                                          docker
                                                          container
                                                          reproducing
                                                          the problems
                                                          here.</p><p class="">Actually,
                                                          if I look at
                                                          snes_tutorials-ex12_quad_singular_hpddm 
                                                          it fails like
                                                          this:</p><p class="">not
                                                          ok
                                                          snes_tutorials-ex12_quad_singular_hpddm
                                                          # Error code:
                                                          59<br class="">
                                                          #      
                                                          Initial guess<br class="">
                                                          #       L_2
                                                          Error:
                                                          0.00803099<br class="">
                                                          #      
                                                          Initial
                                                          Residual<br class="">
                                                          #       L_2
                                                          Residual:
                                                          1.09057<br class="">
                                                          #       Au - b
                                                          = Au + F(0)<br class="">
                                                          #       Linear
                                                          L_2 Residual:
                                                          1.09057<br class="">
                                                          #      
                                                          [d470c54ce086:14127]
                                                          Read -1,
                                                          expected 4096,
                                                          errno = 1<br class="">
                                                          #      
                                                          [d470c54ce086:14128]
                                                          Read -1,
                                                          expected 4096,
                                                          errno = 1<br class="">
                                                          #      
                                                          [d470c54ce086:14129]
                                                          Read -1,
                                                          expected 4096,
                                                          errno = 1<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR:
                                                          ------------------------------------------------------------------------<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR: Caught
                                                          signal number
                                                          11 SEGV:
                                                          Segmentation
                                                          Violation,
                                                          probably
                                                          memory access
                                                          out of range<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR: Try
                                                          option
                                                          -start_in_debugger
                                                          or
-on_error_attach_debugger<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR: or see
                                                          <a href="https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind" target="_blank" class="" moz-do-not-send="true">https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a><br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR: or try
                                                          <a href="http://valgrind.org/" target="_blank" class="" moz-do-not-send="true">http://valgrind.org</a>
                                                          on GNU/linux
                                                          and Apple Mac
                                                          OS X to find
                                                          memory
                                                          corruption
                                                          errors<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR: likely
                                                          location of
                                                          problem given
                                                          in stack below<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR:
                                                          --------------------- 
                                                          Stack Frames
                                                          ------------------------------------<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR: Note:
                                                          The EXACT line
                                                          numbers in the
                                                          stack are not
                                                          available,<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR:      
                                                          INSTEAD the
                                                          line number of
                                                          the start of
                                                          the function<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR:      
                                                          is given.<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR: [3]
                                                          buildTwo line
                                                          987
/opt/petsc-main/include/HPDDM_schwarz.hpp<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR: [3]
                                                          next line 1130
/opt/petsc-main/include/HPDDM_schwarz.hpp<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR:
                                                          ---------------------
                                                          Error Message
--------------------------------------------------------------<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR: Signal
                                                          received<br class="">
                                                          #      
                                                          [3]PETSC
                                                          ERROR:
                                                          [0]PETSC
                                                          ERROR:
------------------------------------------------------------------------</p><p class="">also
ex12_quad_hpddm_reuse_baij fails with a lot more "Read -1, expected ..."
                                                          which I don't
                                                          know where
                                                          they come
                                                          from...?</p><p class="">Hypre
                                                          (like in
                                                          diff-snes_tutorials-ex56_hypre) 
                                                          is also having
DIVERGED_INDEFINITE_PC failures...</p><p class="">Please
                                                          see the 3
                                                          attached
                                                          docker files:</p><p class="">1)
fedora_mkl_and_devtools : the DockerFile which install fedore 33 with
                                                          gnu compilers
                                                          and MKL and
                                                          everything to
                                                          develop.</p><p class="">2)
                                                          openmpi: the
                                                          DockerFile to
                                                          bluid OpenMPI</p><p class="">3)
                                                          petsc: The las
                                                          DockerFile
                                                          that
                                                          build/install
                                                          and test PETSc</p><p class="">I
                                                          build the 3
                                                          like this:</p><p class="">docker
                                                          build -t
                                                          fedora_mkl_and_devtools
                                                          -f
                                                          fedora_mkl_and_devtools
                                                          .</p><p class="">docker
                                                          build -t
                                                          openmpi -f
                                                          openmpi .</p><p class="">docker
                                                          build -t petsc
                                                          -f petsc .</p><p class="">Disclaimer:
                                                          I am not a
                                                          docker expert,
                                                          so I may do
                                                          things that
                                                          are not
docker-stat-of-the-art but I am opened to suggestions... ;)<br class="">
                                                          </p><p class="">I
                                                          have just ran
                                                          it on my
                                                          portable
                                                          (long) which
                                                          have not
                                                          enough cores,
                                                          so many more
                                                          tests failed
                                                          (should force
--oversubscribe but don't know how to).  I will relaunch on my
                                                          workstation in
                                                          a few minutes.</p><p class="">I
                                                          will now test
                                                          your branch!
                                                          (sorry for the
                                                          delay).</p><p class="">Thanks,</p><p class="">Eric<br class="">
                                                          </p>
                                                          <div class="">On
                                                          2021-03-11
                                                          9:03 a.m.,
                                                          Eric
                                                          Chamberland
                                                          wrote:<br class="">
                                                          </div>
                                                          <blockquote type="cite" class=""><p class="">Hi
                                                          Pierre,</p><p class="">ok,
                                                          that's
                                                          interesting!</p><p class="">I
                                                          will try to
                                                          build a docker
                                                          image until
                                                          tomorrow and
                                                          give you the
                                                          exact recipe
                                                          to reproduce
                                                          the bugs.</p><p class="">Eric</p><p class=""><br class="">
                                                          </p>
                                                          <div class="">On
                                                          2021-03-11
                                                          2:46 a.m.,
                                                          Pierre Jolivet
                                                          wrote:<br class="">
                                                          </div>
                                                          <blockquote type="cite" class=""> <br class="">
                                                          <div class=""><br class="">
                                                          <blockquote type="cite" class="">
                                                          <div class="">On
                                                          11 Mar 2021,
                                                          at 6:16 AM,
                                                          Barry Smith
                                                          <<a href="mailto:bsmith@petsc.dev" target="_blank" class="" moz-do-not-send="true">bsmith@petsc.dev</a>>
                                                          wrote:</div>
                                                          <br class="">
                                                          <div class="">
                                                          <div style="overflow-wrap:
                                                          break-word;" class="">
                                                          <div class=""><br class="">
                                                          </div>
                                                            Eric,
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""> 
                                                           Sorry about
                                                          not being more
                                                          immediate. We
                                                          still have
                                                          this in our
                                                          active email
                                                          so you don't
                                                          need to submit
                                                          individual
                                                          issues. We'll
                                                          try to get to
                                                          them as soon
                                                          as we can.</div>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">Indeed,
                                                          I’m still
                                                          trying to
                                                          figure this
                                                          out.</div>
                                                          <div class="">I
                                                          realized that
                                                          some of my
                                                          configure
                                                          flags were
                                                          different than
                                                          yours, e.g.,
                                                          no
                                                          --with-memalign.</div>
                                                          <div class="">I’ve
                                                          also added
                                                          SuperLU_DIST
                                                          to my
                                                          installation.</div>
                                                          <div class="">Still,
                                                          I can’t
                                                          reproduce any
                                                          issue.</div>
                                                          <div class="">I
                                                          will continue
                                                          looking into
                                                          this, it
                                                          appears I’m
                                                          seeing some
                                                          valgrind
                                                          errors, but I
                                                          don’t know if
                                                          this is some
                                                          side effect of
                                                          OpenMPI not
                                                          being
                                                          valgrind-clean
                                                          (last time I
                                                          checked, there
                                                          was no error
                                                          with MPICH).</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">Thank
                                                          you for your
                                                          patience,</div>
                                                          <div class="">Pierre</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">
                                                          <div class="">/usr/bin/gmake
                                                          -f gmakefile
                                                          test
                                                          test-fail=1</div>
                                                          <div class="">Using
                                                          MAKEFLAGS:
                                                          test-fail=1</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex12_quad_hpddm_reuse_baij</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex12_quad_hpddm_reuse_baij</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts</div>
                                                          <div class=""> ok
ksp_ksp_tests-ex33_superlu_dist_2</div>
                                                          <div class=""> ok
diff-ksp_ksp_tests-ex33_superlu_dist_2</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts</div>
                                                          <div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0</div>
                                                          <div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0</div>
                                                          <div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1</div>
                                                          <div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1</div>
                                                          <div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0</div>
                                                          <div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0</div>
                                                          <div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1</div>
                                                          <div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1</div>
                                                          <div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0</div>
                                                          <div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0</div>
                                                          <div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1</div>
                                                          <div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1</div>
                                                          <div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0</div>
                                                          <div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0</div>
                                                          <div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1</div>
                                                          <div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1</div>
                                                          <div class=""> 
                                                                TEST
                                                          arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts</div>
                                                          <div class=""> ok
ksp_ksp_tutorials-ex50_tut_2</div>
                                                          <div class=""> ok
diff-ksp_ksp_tutorials-ex50_tut_2</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts</div>
                                                          <div class=""> ok
ksp_ksp_tests-ex33_superlu_dist</div>
                                                          <div class=""> ok
diff-ksp_ksp_tests-ex33_superlu_dist</div>
                                                          <div class=""> 
                                                                TEST
                                                          arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex56_hypre</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex56_hypre</div>
                                                          <div class=""> 
                                                                TEST
                                                          arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts</div>
                                                          <div class=""> ok
ksp_ksp_tutorials-ex56_2</div>
                                                          <div class=""> ok
diff-ksp_ksp_tutorials-ex56_2</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex17_3d_q3_trig_elas</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex17_3d_q3_trig_elas</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts</div>
                                                          <div class="">not
                                                          ok
                                                          ksp_ksp_tutorials-ex5_superlu_dist_3
                                                          # Error code:
                                                          1</div>
                                                          <div class="">#<span style="white-space:pre-wrap" class=""> </span>srun:
                                                          error: Unable
                                                          to create step
                                                          for job
                                                          1426755: More
                                                          processors
                                                          requested than
                                                          permitted</div>
                                                          <div class=""> ok
ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command failed so no diff</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts</div>
                                                          <div class=""> ok
ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran required for this
                                                          test</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex12_tri_parmetis_hpddm_baij</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij</div>
                                                          <div class=""> 
                                                                TEST
                                                          arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex19_tut_3</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex19_tut_3</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex17_3d_q3_trig_vlap</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex17_3d_q3_trig_vlap</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts</div>
                                                          <div class=""> ok
ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP Fortran required for this
                                                          test</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex19_superlu_dist</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex19_superlu_dist</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts</div>
                                                          <div class=""> ok
ksp_ksp_tutorials-ex49_hypre_nullspace</div>
                                                          <div class=""> ok
diff-ksp_ksp_tutorials-ex49_hypre_nullspace</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex19_superlu_dist_2</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex19_superlu_dist_2</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts</div>
                                                          <div class="">not
                                                          ok
                                                          ksp_ksp_tutorials-ex5_superlu_dist_2
                                                          # Error code:
                                                          1</div>
                                                          <div class="">#<span style="white-space:pre-wrap" class=""> </span>srun:
                                                          error: Unable
                                                          to create step
                                                          for job
                                                          1426755: More
                                                          processors
                                                          requested than
                                                          permitted</div>
                                                          <div class=""> ok
ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command failed so no diff</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts</div>
                                                          <div class=""> ok
snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre</div>
                                                          <div class=""> ok
diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre</div>
                                                          <div class=""> 
                                                                TEST
                                                          arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts</div>
                                                          <div class=""> ok
ksp_ksp_tutorials-ex64_1</div>
                                                          <div class=""> ok
diff-ksp_ksp_tutorials-ex64_1</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts</div>
                                                          <div class="">not
                                                          ok
                                                          ksp_ksp_tutorials-ex5_superlu_dist
                                                          # Error code:
                                                          1</div>
                                                          <div class="">#<span style="white-space:pre-wrap" class=""> </span>srun:
                                                          error: Unable
                                                          to create step
                                                          for job
                                                          1426755: More
                                                          processors
                                                          requested than
                                                          permitted</div>
                                                          <div class=""> ok
ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command failed so no diff</div>
                                                          <div class=""> 
                                                                TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts</div>
                                                          <div class=""> ok
ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP Fortran required for this
                                                          test</div>
                                                          </div>
                                                          <br class="">
                                                          <blockquote type="cite" class="">
                                                          <div class="">
                                                          <div style="overflow-wrap:
                                                          break-word;" class="">
                                                          <div class=""> 
                                                           Barry</div>
                                                          <div class=""><br class="">
                                                          <div class=""><br class="">
                                                          <blockquote type="cite" class="">
                                                          <div class="">On
                                                          Mar 10, 2021,
                                                          at 11:03 PM,
                                                          Eric
                                                          Chamberland
                                                          <<a href="mailto:Eric.Chamberland@giref.ulaval.ca" target="_blank" class="" moz-do-not-send="true">Eric.Chamberland@giref.ulaval.ca</a>>
                                                          wrote:</div>
                                                          <br class="">
                                                          <div class="">
                                                          <div class=""><p class="">Barry,</p><p class="">to
                                                          get a some
                                                          follow up on
                                                          --with-openmp=1
                                                          failures,
                                                          shall I open
                                                          gitlab issues
                                                          for:</p><p class="">a)
                                                          all hypre
                                                          failures
                                                          giving <span style="white-space:pre-wrap" class="">DIVERGED_INDEFINITE_PC</span></p><p class=""><span style="white-space:pre-wrap" class="">b) all superlu_dist failures giving different results with </span><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class="">initia and "Exceeded timeout limit of 60 s"</span></span></p><p class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class="">c) hpddm failures "free(): invalid next size (fast)" and "Segmentation Violation"
</span></span></p><p class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class="">d) all tao's </span></span><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class="">"Exceeded timeout limit of 60 s"</span></span></span></span></p><p class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class="">I don't see how I could do all these debugging by myself...</span></span></span></span></p><p class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class="">Thanks,</span></span></span></span></p><p class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class=""><span style="white-space:pre-wrap" class="">Eric
</span></span></span></span></p>
                                                          <br class="">
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br class="">
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br class="">
                                                          </blockquote>
                                                          <pre cols="72" class="">-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42</pre>
                                                          </blockquote>
                                                          <pre cols="72" class="">-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42</pre>
                                                          </div>
                                                          <span id="gmail-m_3567963440499379521cid:26EB476E-4C18-4B68-9DC9-6FBE92E94935" class=""><fedora_mkl_and_devtools.txt></span><span id="gmail-m_3567963440499379521cid:EC379F4B-01BD-409E-8BBC-6FBA5A49236E" class=""><openmpi.txt></span><span id="gmail-m_3567963440499379521cid:3CA91B7D-219A-4965-9FF3-D836488847A0" class=""><petsc.txt></span></div>
                                                          </blockquote>
                                                        </div>
                                                        <br class="">
                                                      </div>
                                                    </div>
                                                  </div>
                                                </blockquote>
                                              </div>
                                              <br class="">
                                            </div>
                                          </blockquote>
                                        </div>
                                      </div>
                                    </blockquote>
                                  </div>
                                  <br class="">
                                </blockquote>
                                <pre class="moz-signature" cols="72">-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42</pre>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                        <br class="">
                      </div>
                    </blockquote>
                    <pre class="moz-signature" cols="72">-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42</pre>
                  </blockquote>
                  <pre class="moz-signature" cols="72">-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42</pre>
                </blockquote>
                <pre class="moz-signature" cols="72">-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42</pre>
              </div>
            </div>
          </blockquote>
        </div>
        <br class="">
      </div>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42</pre>
  </div>

</div></blockquote></div><br class=""></body></html>