<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi,</p>
    <p>that's really good news for us, thanks! I will plot again the
      memory scaling using these new options and let you know. Next week
      I hope.</p>
    <p>Before that, I just need to clarify the situation. Throughout our
      discussions, we mentionned a number of options concerning the
      scalability:</p>
    <p>-matptatp_via scalable<br>
      -inner_diag_matmatmult_via scalable<br>
      -inner_diag_matmatmult_via scalable<br>
      -mat_freeintermediatedatastructures <br>
      -matptap_via allatonce<br>
      -matptap_via allatonce_merged</p>
    <p>Which ones of them are compatible? Should I use all of them at
      the same time? Is there redundancy?<br>
    </p>
    <p>Thanks,</p>
    <p>Myriam<br>
    </p>
    <br>
    <div class="moz-cite-prefix">Le 04/25/19 à 21:47, Zhang, Hong a
      écrit :<br>
    </div>
    <blockquote type="cite"
cite="mid:CAGCphBs6Svo7vohR_x+MzHCwu_ycNW475s-F6P0PKH8xPzj0fg@mail.gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">
            <div dir="ltr">
              <div dir="ltr">Myriam:<br>
              </div>
              <div>Checking MatPtAP() in petsc-3.6.4, I realized that it
                uses different algorithm than petsc-10 and later
                versions. petsc-3.6 uses out-product for C=P^T * AP,
                while petsc-3.10 uses local transpose of P. petsc-3.10
                accelerates data accessing, but doubles the memory of
                P. </div>
              <div><br>
              </div>
              <div>Fande added two new implementations for MatPtAP() to
                petsc-master which use much smaller and scalable
                memories with slightly higher computing time (faster
                than hypre though). You may use these new
                implementations if you have concern on memory
                scalability. The option for these new implementation
                are: </div>
              <div>-matptap_via allatonce<br>
              </div>
              <div>-matptap_via allatonce_merged<br>
              </div>
              <div><br>
              </div>
              <div>Hong</div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Mon, Apr 15, 2019
                  at 12:10 PM <a href="mailto:hzhang@mcs.anl.gov"
                    target="_blank" moz-do-not-send="true">
                    hzhang@mcs.anl.gov</a> <<a
                    href="mailto:hzhang@mcs.anl.gov" target="_blank"
                    moz-do-not-send="true">hzhang@mcs.anl.gov</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px 0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">
                  <div dir="ltr">
                    <div dir="ltr">Myriam:<br>
                    </div>
                    <div>Thank you very much for providing these
                      results!</div>
                    <div>I have put effort to accelerate execution time
                      and avoid using global sizes in PtAP, for which
                      the algorithm of transpose of P_local and P_other
                      likely doubles the memory usage. I'll try to
                      investigate why it becomes unscalable.</div>
                    <div>Hong</div>
                    <div class="gmail_quote">
                      <blockquote class="gmail_quote" style="margin:0px
                        0px 0px 0.8ex;border-left:1px solid
                        rgb(204,204,204);padding-left:1ex">
                        <div bgcolor="#FFFFFF">
                          <p>Hi,</p>
                          <p>you'll find the new scaling attached (green
                            line). I used the version 3.11 and the four
                            scalability options :<br>
                            -matptap_via scalable<br>
                            -inner_diag_matmatmult_via scalable<br>
                            -inner_offdiag_matmatmult_via scalable<br>
                            -mat_freeintermediatedatastructures</p>
                          <p>The scaling is much better! The code even
                            uses less memory for the smallest cases.
                            There is still an increase for the larger
                            one.
                            <br>
                          </p>
                          <p>With regard to the time scaling, I used
                            KSPView and LogView on the two previous
                            scalings (blue and yellow lines) but not on
                            the last one (green line). So we can't
                            really compare them, am I right? However, we
                            can see that the new time scaling looks
                            quite good. It slightly increases from ~8s
                            to ~27s. <br>
                          </p>
                          <p>Unfortunately, the computations are
                            expensive so I would like to avoid re-run
                            them if possible. How relevant would be a
                            proper time scaling for you? 
                            <br>
                          </p>
                          <p>Myriam<br>
                          </p>
                          <br>
                          <div
class="gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-cite-prefix">Le
                            04/12/19 à 18:18, Zhang, Hong a écrit :<br>
                          </div>
                          <blockquote type="cite">
                            <div dir="ltr">
                              <div dir="ltr">Myriam :<br>
                              </div>
                              <div>Thanks for your effort. It will help
                                us improve PETSc.</div>
                              <div>Hong</div>
                              <div><br>
                              </div>
                              <div class="gmail_quote">
                                <blockquote class="gmail_quote"
                                  style="margin:0px 0px 0px
                                  0.8ex;border-left:1px solid
                                  rgb(204,204,204);padding-left:1ex">
                                  Hi all,<br>
                                  <br>
                                  I used the wrong script, that's why it
                                  diverged... Sorry about that. <br>
                                  I tried again with the right script
                                  applied on a tiny problem (~200<br>
                                  elements). I can see a small
                                  difference in memory usage (gain ~
                                  1mB).<br>
                                  when adding the
                                  -mat_freeintermediatestructures
                                  option. I still have to<br>
                                  execute larger cases to plot the
                                  scaling. The supercomputer I am used
                                  to<br>
                                  run my jobs on is really busy at the
                                  moment so it takes a while. I hope<br>
                                  I'll send you the results on Monday.<br>
                                  <br>
                                  Thanks everyone,<br>
                                  <br>
                                  Myriam<br>
                                  <br>
                                  <br>
                                  Le 04/11/19 à 06:01, Jed Brown a
                                  écrit :<br>
                                  > "Zhang, Hong" <<a
                                    href="mailto:hzhang@mcs.anl.gov"
                                    target="_blank"
                                    moz-do-not-send="true">hzhang@mcs.anl.gov</a>>
                                  writes:<br>
                                  ><br>
                                  >> Jed:<br>
                                  >>>> Myriam,<br>
                                  >>>> Thanks for the plot.
                                  '-mat_freeintermediatedatastructures'
                                  should not affect solution. It
                                  releases almost half of memory in
                                  C=PtAP if C is not reused.<br>
                                  >>> And yet if turning it on
                                  causes divergence, that would imply a
                                  bug.<br>
                                  >>> Hong, are you able to
                                  reproduce the experiment to see the
                                  memory<br>
                                  >>> scaling?<br>
                                  >> I like to test his code using
                                  an alcf machine, but my hands are full
                                  now. I'll try it as soon as I find
                                  time, hopefully next week.<br>
                                  > I have now compiled and run her
                                  code locally.<br>
                                  ><br>
                                  > Myriam, thanks for your last mail
                                  adding configuration and removing the<br>
                                  > MemManager.h dependency.  I ran
                                  with and without<br>
                                  >
                                  -mat_freeintermediatedatastructures
                                  and don't see a difference in<br>
                                  > convergence.  What commands did
                                  you run to observe that difference?<br>
                                  <br>
                                  -- <br>
                                  Myriam Peyrounette<br>
                                  CNRS/IDRIS - HLST<br>
                                  --<br>
                                  <br>
                                  <br>
                                </blockquote>
                              </div>
                            </div>
                          </blockquote>
                          <br>
                          <pre class="gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-signature" cols="72">-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
                        </div>
                      </blockquote>
                    </div>
                  </div>
                </blockquote>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
  </body>
</html>