<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Thanks for your plots. <div><br></div><div>The new algorithms should be scalable in terms of the memory usage. I am puzzled by these plots since the memory usage increases exponentially.  It may come from somewhere else? How do you measure the memory?  The memory is for the entire simulation or just PtAP? Could you measure the memory for PtAP only? Maybe several factors affect the memory usage not only PtAP. </div><div><br></div><div> I will grab some data from my own simulations.  </div><div><br></div><div>Are you running ex43?</div><div><br></div><div>Fande,</div><div><br></div><div><br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 3, 2019 at 8:14 AM Myriam Peyrounette <<a href="mailto:myriam.peyrounette@idris.fr">myriam.peyrounette@idris.fr</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    <p>And the attached files... Sorry<br>
    </p>
    <br>
    <div class="gmail-m_4480500180785847151moz-cite-prefix">Le 05/03/19 à 16:11, Myriam Peyrounette
      a écrit :<br>
    </div>
    <blockquote type="cite">
      
      <p>Hi,</p>
      <p>I plotted new scalings (memory and time) using the new
        algorithms. I used the options <i>-options_left true </i>to
        make sure that the options are effectively used. They are. <br>
      </p>
      <p>I don't have access to the platform I used to run my
        computations on, so I ran them on a different one. In
        particular, I can't reach problem size = 1e8 and the values
        might be different from the previous scalings I sent you. But
        the comparison of the PETSc versions and options is still
        relevant. <br>
      </p>
      <p>I plotted the scalings of reference: the "good" one (PETSc
        3.6.4) in green, the "bad" one (PETSc 3.10.2) in blue.<br>
      </p>
      <p>I used the commit d330a26 (3.11.1) for all the other scalings,
        adding different sets of options:</p>
      <p><i>Light blue</i> -> -matptap_via
        allatonce  -mat_freeintermediatedatastructures 1<br>
        <i>Orange</i> -> -matptap_via allatonce_<b>merged</b> -mat_freeintermediatedatastructures
        1<br>
        <i>Purple</i> -> -matptap_via
        allatonce  -mat_freeintermediatedatastructures 1 <b>-inner_diag_matmatmult_via
          scalable -inner_offdiag_matmatmult_via scalable</b><br>
        <i>Yellow</i>: -matptap_via allatonce_<b>merged</b> -mat_freeintermediatedatastructures
        1 <b>-inner_diag_matmatmult_via scalable
          -inner_offdiag_matmatmult_via scalable</b></p>
      <p>Conclusion: with regard to memory, the two algorithms imply a
        similarly good improvement of the scaling. The use of the
        -inner_(off)diag_matmatmult_via options is also very
        interesting. The scaling is still not as good as 3.6.4 though.<br>
        With regard to time, I noted a real improvement in time
        execution! I used to spend 200-300s on these executions. Now
        they take 10-15s. Beside that, the "_merged" versions are more
        efficient. And the -inner_(off)diaf_matmatmult_via options are
        slightly expensive but it is not critical.</p>
      <p>What do you think? Is it possible to match again the scaling of
        PETSc 3.6.4? Is it worthy keeping investigating?</p>
      <p>Myriam</p>
      <p><br>
      </p>
      <div class="gmail-m_4480500180785847151moz-cite-prefix">Le 04/30/19 à 17:00, Fande Kong a
        écrit :<br>
      </div>
      <blockquote type="cite">
        
        <div dir="ltr">
          <div dir="ltr">
            <div dir="ltr">
              <div dir="ltr">HI Myriam,
                <div><br>
                </div>
                <div>We are interesting how the new algorithms perform.
                  So there are two new algorithms you could try.</div>
                <div><br>
                </div>
                <div>Algorithm 1:</div>
                <div><br>
                </div>
                <div>-matptap_via
                  allatonce  -mat_freeintermediatedatastructures 1<br>
                </div>
                <div><br>
                </div>
                <div>Algorithm 2:</div>
                <div><br>
                </div>
                <div>-matptap_via
                  allatonce_merged -mat_freeintermediatedatastructures 1<br>
                </div>
                <div><br>
                </div>
                <div><br>
                </div>
                <div>Note that you need to use the current petsc-master,
                  and also please put "-snes_view" in your script so
                  that we can confirm these options are actually get
                  set.</div>
                <div><br>
                </div>
                <div>Thanks,</div>
                <div><br>
                </div>
                <div>Fande,</div>
                <div><br>
                </div>
              </div>
            </div>
          </div>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Tue, Apr 30, 2019 at 2:26
            AM Myriam Peyrounette via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div bgcolor="#FFFFFF">
              <p>Hi,</p>
              <p>that's really good news for us, thanks! I will plot
                again the memory scaling using these new options and let
                you know. Next week I hope.</p>
              <p>Before that, I just need to clarify the situation.
                Throughout our discussions, we mentionned a number of
                options concerning the scalability:</p>
              <p>-matptatp_via scalable<br>
                -inner_diag_matmatmult_via scalable<br>
                -inner_diag_matmatmult_via scalable<br>
                -mat_freeintermediatedatastructures <br>
                -matptap_via allatonce<br>
                -matptap_via allatonce_merged</p>
              <p>Which ones of them are compatible? Should I use all of
                them at the same time? Is there redundancy?<br>
              </p>
              <p>Thanks,</p>
              <p>Myriam<br>
              </p>
              <br>
              <div class="gmail-m_4480500180785847151gmail-m_5004975596082747442moz-cite-prefix">Le
                04/25/19 à 21:47, Zhang, Hong a écrit :<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div dir="ltr">
                    <div dir="ltr">
                      <div dir="ltr">
                        <div dir="ltr">Myriam:<br>
                        </div>
                        <div>Checking MatPtAP() in petsc-3.6.4, I
                          realized that it uses different algorithm than
                          petsc-10 and later versions. petsc-3.6 uses
                          out-product for C=P^T * AP, while petsc-3.10
                          uses local transpose of P. petsc-3.10
                          accelerates data accessing, but doubles the
                          memory of P. </div>
                        <div><br>
                        </div>
                        <div>Fande added two new implementations for
                          MatPtAP() to petsc-master which use much
                          smaller and scalable memories with slightly
                          higher computing time (faster than hypre
                          though). You may use these new implementations
                          if you have concern on memory scalability. The
                          option for these new implementation are: </div>
                        <div>-matptap_via allatonce<br>
                        </div>
                        <div>-matptap_via allatonce_merged<br>
                        </div>
                        <div><br>
                        </div>
                        <div>Hong</div>
                        <br>
                        <div class="gmail_quote">
                          <div dir="ltr" class="gmail_attr">On Mon, Apr
                            15, 2019 at 12:10 PM <a href="mailto:hzhang@mcs.anl.gov" target="_blank">
                              hzhang@mcs.anl.gov</a> <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>>
                            wrote:<br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div dir="ltr">
                              <div dir="ltr">Myriam:<br>
                              </div>
                              <div>Thank you very much for providing
                                these results!</div>
                              <div>I have put effort to accelerate
                                execution time and avoid using global
                                sizes in PtAP, for which the algorithm
                                of transpose of P_local and P_other
                                likely doubles the memory usage. I'll
                                try to investigate why it becomes
                                unscalable.</div>
                              <div>Hong</div>
                              <div class="gmail_quote">
                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                  <div bgcolor="#FFFFFF">
                                    <p>Hi,</p>
                                    <p>you'll find the new scaling
                                      attached (green line). I used the
                                      version 3.11 and the four
                                      scalability options :<br>
                                      -matptap_via scalable<br>
                                      -inner_diag_matmatmult_via
                                      scalable<br>
                                      -inner_offdiag_matmatmult_via
                                      scalable<br>
-mat_freeintermediatedatastructures</p>
                                    <p>The scaling is much better! The
                                      code even uses less memory for the
                                      smallest cases. There is still an
                                      increase for the larger one. <br>
                                    </p>
                                    <p>With regard to the time scaling,
                                      I used KSPView and LogView on the
                                      two previous scalings (blue and
                                      yellow lines) but not on the last
                                      one (green line). So we can't
                                      really compare them, am I right?
                                      However, we can see that the new
                                      time scaling looks quite good. It
                                      slightly increases from ~8s to
                                      ~27s. <br>
                                    </p>
                                    <p>Unfortunately, the computations
                                      are expensive so I would like to
                                      avoid re-run them if possible. How
                                      relevant would be a proper time
                                      scaling for you?  <br>
                                    </p>
                                    <p>Myriam<br>
                                    </p>
                                    <br>
                                    <div class="gmail-m_4480500180785847151gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-cite-prefix">Le
                                      04/12/19 à 18:18, Zhang, Hong a
                                      écrit :<br>
                                    </div>
                                    <blockquote type="cite">
                                      <div dir="ltr">
                                        <div dir="ltr">Myriam :<br>
                                        </div>
                                        <div>Thanks for your effort. It
                                          will help us improve PETSc.</div>
                                        <div>Hong</div>
                                        <div><br>
                                        </div>
                                        <div class="gmail_quote">
                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Hi all,<br>
                                            <br>
                                            I used the wrong script,
                                            that's why it diverged...
                                            Sorry about that. <br>
                                            I tried again with the right
                                            script applied on a tiny
                                            problem (~200<br>
                                            elements). I can see a small
                                            difference in memory usage
                                            (gain ~ 1mB).<br>
                                            when adding the
                                            -mat_freeintermediatestructures
                                            option. I still have to<br>
                                            execute larger cases to plot
                                            the scaling. The
                                            supercomputer I am used to<br>
                                            run my jobs on is really
                                            busy at the moment so it
                                            takes a while. I hope<br>
                                            I'll send you the results on
                                            Monday.<br>
                                            <br>
                                            Thanks everyone,<br>
                                            <br>
                                            Myriam<br>
                                            <br>
                                            <br>
                                            Le 04/11/19 à 06:01, Jed
                                            Brown a écrit :<br>
                                            > "Zhang, Hong" <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>>
                                            writes:<br>
                                            ><br>
                                            >> Jed:<br>
                                            >>>> Myriam,<br>
                                            >>>> Thanks for
                                            the plot.
                                            '-mat_freeintermediatedatastructures'
                                            should not affect solution.
                                            It releases almost half of
                                            memory in C=PtAP if C is not
                                            reused.<br>
                                            >>> And yet if
                                            turning it on causes
                                            divergence, that would imply
                                            a bug.<br>
                                            >>> Hong, are you
                                            able to reproduce the
                                            experiment to see the memory<br>
                                            >>> scaling?<br>
                                            >> I like to test his
                                            code using an alcf machine,
                                            but my hands are full now.
                                            I'll try it as soon as I
                                            find time, hopefully next
                                            week.<br>
                                            > I have now compiled and
                                            run her code locally.<br>
                                            ><br>
                                            > Myriam, thanks for your
                                            last mail adding
                                            configuration and removing
                                            the<br>
                                            > MemManager.h
                                            dependency.  I ran with and
                                            without<br>
                                            >
                                            -mat_freeintermediatedatastructures
                                            and don't see a difference
                                            in<br>
                                            > convergence.  What
                                            commands did you run to
                                            observe that difference?<br>
                                            <br>
                                            -- <br>
                                            Myriam Peyrounette<br>
                                            CNRS/IDRIS - HLST<br>
                                            --<br>
                                            <br>
                                            <br>
                                          </blockquote>
                                        </div>
                                      </div>
                                    </blockquote>
                                    <br>
                                    <pre class="gmail-m_4480500180785847151gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-signature" cols="72">-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
                                  </div>
                                </blockquote>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </blockquote>
              <br>
              <pre class="gmail-m_4480500180785847151gmail-m_5004975596082747442moz-signature" cols="72">-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
            </div>
          </blockquote>
        </div>
      </blockquote>
      <br>
      <pre class="gmail-m_4480500180785847151moz-signature" cols="72">-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
    </blockquote>
    <br>
    <pre class="gmail-m_4480500180785847151moz-signature" cols="72">-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
  </div>

</blockquote></div>