<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi,</p>
    <p>I plotted new scalings (memory and time) using the new
      algorithms. I used the options <i>-options_left true </i>to make
      sure that the options are effectively used. They are. <br>
    </p>
    <p>I don't have access to the platform I used to run my computations
      on, so I ran them on a different one. In particular, I can't reach
      problem size = 1e8 and the values might be different from the
      previous scalings I sent you. But the comparison of the PETSc
      versions and options is still relevant. <br>
    </p>
    <p>I plotted the scalings of reference: the "good" one (PETSc 3.6.4)
      in green, the "bad" one (PETSc 3.10.2) in blue.<br>
    </p>
    <p>I used the commit d330a26 (3.11.1) for all the other scalings,
      adding different sets of options:</p>
    <p><i>Light blue</i> -> -matptap_via
      allatonce  -mat_freeintermediatedatastructures 1<br>
      <i>Orange</i> -> -matptap_via allatonce_<b>merged</b> -mat_freeintermediatedatastructures
      1<br>
      <i>Purple</i> -> -matptap_via
      allatonce  -mat_freeintermediatedatastructures 1 <b>-inner_diag_matmatmult_via
        scalable -inner_offdiag_matmatmult_via scalable</b><br>
      <i>Yellow</i>: -matptap_via allatonce_<b>merged</b> -mat_freeintermediatedatastructures
      1 <b>-inner_diag_matmatmult_via scalable
        -inner_offdiag_matmatmult_via scalable</b></p>
    <p>Conclusion: with regard to memory, the two algorithms imply a
      similarly good improvement of the scaling. The use of the
      -inner_(off)diag_matmatmult_via options is also very interesting.
      The scaling is still not as good as 3.6.4 though.<br>
      With regard to time, I noted a real improvement in time execution!
      I used to spend 200-300s on these executions. Now they take
      10-15s. Beside that, the "_merged" versions are more efficient.
      And the -inner_(off)diaf_matmatmult_via options are slightly
      expensive but it is not critical.</p>
    <p>What do you think? Is it possible to match again the scaling of
      PETSc 3.6.4? Is it worthy keeping investigating?</p>
    <p>Myriam</p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">Le 04/30/19 à 17:00, Fande Kong a
      écrit :<br>
    </div>
    <blockquote type="cite"
cite="mid:CAN5Wd-JAacDFEWmxbJ1PFY0gQWU5NJweEF=ctk+J-eCDv6BViA@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">
            <div dir="ltr">HI Myriam,
              <div><br>
              </div>
              <div>We are interesting how the new algorithms perform. So
                there are two new algorithms you could try.</div>
              <div><br>
              </div>
              <div>Algorithm 1:</div>
              <div><br>
              </div>
              <div>-matptap_via
                allatonce  -mat_freeintermediatedatastructures 1<br>
              </div>
              <div><br>
              </div>
              <div>Algorithm 2:</div>
              <div><br>
              </div>
              <div>-matptap_via
                allatonce_merged -mat_freeintermediatedatastructures 1<br>
              </div>
              <div><br>
              </div>
              <div><br>
              </div>
              <div>Note that you need to use the current petsc-master,
                and also please put "-snes_view" in your script so that
                we can confirm these options are actually get set.</div>
              <div><br>
              </div>
              <div>Thanks,</div>
              <div><br>
              </div>
              <div>Fande,</div>
              <div><br>
              </div>
            </div>
          </div>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Tue, Apr 30, 2019 at 2:26
          AM Myriam Peyrounette via petsc-users <<a
            href="mailto:petsc-users@mcs.anl.gov" moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div bgcolor="#FFFFFF">
            <p>Hi,</p>
            <p>that's really good news for us, thanks! I will plot again
              the memory scaling using these new options and let you
              know. Next week I hope.</p>
            <p>Before that, I just need to clarify the situation.
              Throughout our discussions, we mentionned a number of
              options concerning the scalability:</p>
            <p>-matptatp_via scalable<br>
              -inner_diag_matmatmult_via scalable<br>
              -inner_diag_matmatmult_via scalable<br>
              -mat_freeintermediatedatastructures <br>
              -matptap_via allatonce<br>
              -matptap_via allatonce_merged</p>
            <p>Which ones of them are compatible? Should I use all of
              them at the same time? Is there redundancy?<br>
            </p>
            <p>Thanks,</p>
            <p>Myriam<br>
            </p>
            <br>
            <div class="gmail-m_5004975596082747442moz-cite-prefix">Le
              04/25/19 à 21:47, Zhang, Hong a écrit :<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">
                <div dir="ltr">
                  <div dir="ltr">
                    <div dir="ltr">
                      <div dir="ltr">Myriam:<br>
                      </div>
                      <div>Checking MatPtAP() in petsc-3.6.4, I realized
                        that it uses different algorithm than petsc-10
                        and later versions. petsc-3.6 uses out-product
                        for C=P^T * AP, while petsc-3.10 uses local
                        transpose of P. petsc-3.10 accelerates data
                        accessing, but doubles the memory of P. </div>
                      <div><br>
                      </div>
                      <div>Fande added two new implementations for
                        MatPtAP() to petsc-master which use much smaller
                        and scalable memories with slightly higher
                        computing time (faster than hypre though). You
                        may use these new implementations if you have
                        concern on memory scalability. The option for
                        these new implementation are: </div>
                      <div>-matptap_via allatonce<br>
                      </div>
                      <div>-matptap_via allatonce_merged<br>
                      </div>
                      <div><br>
                      </div>
                      <div>Hong</div>
                      <br>
                      <div class="gmail_quote">
                        <div dir="ltr" class="gmail_attr">On Mon, Apr
                          15, 2019 at 12:10 PM <a
                            href="mailto:hzhang@mcs.anl.gov"
                            target="_blank" moz-do-not-send="true">
                            hzhang@mcs.anl.gov</a> <<a
                            href="mailto:hzhang@mcs.anl.gov"
                            target="_blank" moz-do-not-send="true">hzhang@mcs.anl.gov</a>>
                          wrote:<br>
                        </div>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
                          0.8ex;border-left:1px solid
                          rgb(204,204,204);padding-left:1ex">
                          <div dir="ltr">
                            <div dir="ltr">Myriam:<br>
                            </div>
                            <div>Thank you very much for providing these
                              results!</div>
                            <div>I have put effort to accelerate
                              execution time and avoid using global
                              sizes in PtAP, for which the algorithm of
                              transpose of P_local and P_other likely
                              doubles the memory usage. I'll try to
                              investigate why it becomes unscalable.</div>
                            <div>Hong</div>
                            <div class="gmail_quote">
                              <blockquote class="gmail_quote"
                                style="margin:0px 0px 0px
                                0.8ex;border-left:1px solid
                                rgb(204,204,204);padding-left:1ex">
                                <div bgcolor="#FFFFFF">
                                  <p>Hi,</p>
                                  <p>you'll find the new scaling
                                    attached (green line). I used the
                                    version 3.11 and the four
                                    scalability options :<br>
                                    -matptap_via scalable<br>
                                    -inner_diag_matmatmult_via scalable<br>
                                    -inner_offdiag_matmatmult_via
                                    scalable<br>
                                    -mat_freeintermediatedatastructures</p>
                                  <p>The scaling is much better! The
                                    code even uses less memory for the
                                    smallest cases. There is still an
                                    increase for the larger one. <br>
                                  </p>
                                  <p>With regard to the time scaling, I
                                    used KSPView and LogView on the two
                                    previous scalings (blue and yellow
                                    lines) but not on the last one
                                    (green line). So we can't really
                                    compare them, am I right? However,
                                    we can see that the new time scaling
                                    looks quite good. It slightly
                                    increases from ~8s to ~27s. <br>
                                  </p>
                                  <p>Unfortunately, the computations are
                                    expensive so I would like to avoid
                                    re-run them if possible. How
                                    relevant would be a proper time
                                    scaling for you?  <br>
                                  </p>
                                  <p>Myriam<br>
                                  </p>
                                  <br>
                                  <div
class="gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-cite-prefix">Le
                                    04/12/19 à 18:18, Zhang, Hong a
                                    écrit :<br>
                                  </div>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div dir="ltr">Myriam :<br>
                                      </div>
                                      <div>Thanks for your effort. It
                                        will help us improve PETSc.</div>
                                      <div>Hong</div>
                                      <div><br>
                                      </div>
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote"
                                          style="margin:0px 0px 0px
                                          0.8ex;border-left:1px solid
                                          rgb(204,204,204);padding-left:1ex">
                                          Hi all,<br>
                                          <br>
                                          I used the wrong script,
                                          that's why it diverged...
                                          Sorry about that. <br>
                                          I tried again with the right
                                          script applied on a tiny
                                          problem (~200<br>
                                          elements). I can see a small
                                          difference in memory usage
                                          (gain ~ 1mB).<br>
                                          when adding the
                                          -mat_freeintermediatestructures
                                          option. I still have to<br>
                                          execute larger cases to plot
                                          the scaling. The supercomputer
                                          I am used to<br>
                                          run my jobs on is really busy
                                          at the moment so it takes a
                                          while. I hope<br>
                                          I'll send you the results on
                                          Monday.<br>
                                          <br>
                                          Thanks everyone,<br>
                                          <br>
                                          Myriam<br>
                                          <br>
                                          <br>
                                          Le 04/11/19 à 06:01, Jed Brown
                                          a écrit :<br>
                                          > "Zhang, Hong" <<a
                                            href="mailto:hzhang@mcs.anl.gov"
                                            target="_blank"
                                            moz-do-not-send="true">hzhang@mcs.anl.gov</a>>
                                          writes:<br>
                                          ><br>
                                          >> Jed:<br>
                                          >>>> Myriam,<br>
                                          >>>> Thanks for
                                          the plot.
                                          '-mat_freeintermediatedatastructures'
                                          should not affect solution. It
                                          releases almost half of memory
                                          in C=PtAP if C is not reused.<br>
                                          >>> And yet if
                                          turning it on causes
                                          divergence, that would imply a
                                          bug.<br>
                                          >>> Hong, are you
                                          able to reproduce the
                                          experiment to see the memory<br>
                                          >>> scaling?<br>
                                          >> I like to test his
                                          code using an alcf machine,
                                          but my hands are full now.
                                          I'll try it as soon as I find
                                          time, hopefully next week.<br>
                                          > I have now compiled and
                                          run her code locally.<br>
                                          ><br>
                                          > Myriam, thanks for your
                                          last mail adding configuration
                                          and removing the<br>
                                          > MemManager.h dependency. 
                                          I ran with and without<br>
                                          >
                                          -mat_freeintermediatedatastructures
                                          and don't see a difference in<br>
                                          > convergence.  What
                                          commands did you run to
                                          observe that difference?<br>
                                          <br>
                                          -- <br>
                                          Myriam Peyrounette<br>
                                          CNRS/IDRIS - HLST<br>
                                          --<br>
                                          <br>
                                          <br>
                                        </blockquote>
                                      </div>
                                    </div>
                                  </blockquote>
                                  <br>
                                  <pre class="gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-signature" cols="72">-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
                                </div>
                              </blockquote>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
            <br>
            <pre class="gmail-m_5004975596082747442moz-signature" cols="72">-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
  </body>
</html>