<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>I have some data from my own simulations. The results do not look bad. </div><div><br></div><div>The following are results (strong scaling) of "-matptap_via allatonce  -mat_freeintermediatedatastructures 1"</div><div><br></div><div>Problem 1 has  2,482,224,480 unknowns, and use 4000, 6000, 10000, and 12000 processor cores.</div><div><br></div><div>4000   processor cores:   587M</div><div>6000   processor cores:   270M</div><div>10000 processor cores:   251M</div><div>12000 processor cores:  136M</div><div dir="ltr"><br></div><div>Problem 2 has 7,446,673,440 unknowns, and use 6000, 10000, and 12000 process cores:</div><div><div>6000   processor cores:   975M</div><div>10000 processor cores:   599M</div><div>12000 processor cores:   415M</div></div><div><br></div><div>The memory is used for PtAP only, and I do not include the memory from the other part of the simulation.</div><div><br></div><div>I am sorry we did not resolve the issue for you so far. I will try to run your example you attached earlier to if we can reproduce it. If we can reproduce the problem, I will use a memory profiling tool to check where the memory comes from.</div><div><br></div><div>Thanks again for your report,</div><div><br></div><div>Fande,</div><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 3, 2019 at 9:26 AM Fande Kong <<a href="mailto:fdkong.jd@gmail.com">fdkong.jd@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Thanks for your plots. <div><br></div><div>The new algorithms should be scalable in terms of the memory usage. I am puzzled by these plots since the memory usage increases exponentially.  It may come from somewhere else? How do you measure the memory?  The memory is for the entire simulation or just PtAP? Could you measure the memory for PtAP only? Maybe several factors affect the memory usage not only PtAP. </div><div><br></div><div> I will grab some data from my own simulations.  </div><div><br></div><div>Are you running ex43?</div><div><br></div><div>Fande,</div><div><br></div><div><br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 3, 2019 at 8:14 AM Myriam Peyrounette <<a href="mailto:myriam.peyrounette@idris.fr" target="_blank">myriam.peyrounette@idris.fr</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div bgcolor="#FFFFFF">

    <p>And the attached files... Sorry<br>

    </p>

    <br>

    <div class="gmail-m_3074498340659128874gmail-m_4480500180785847151moz-cite-prefix">Le 05/03/19 à 16:11, Myriam Peyrounette

      a écrit :<br>

    </div>

    <blockquote type="cite">

      <p>Hi,</p>

      <p>I plotted new scalings (memory and time) using the new

        algorithms. I used the options <i>-options_left true </i>to

        make sure that the options are effectively used. They are. <br>

      </p>

      <p>I don't have access to the platform I used to run my

        computations on, so I ran them on a different one. In

        particular, I can't reach problem size = 1e8 and the values

        might be different from the previous scalings I sent you. But

        the comparison of the PETSc versions and options is still

        relevant. <br>

      </p>

      <p>I plotted the scalings of reference: the "good" one (PETSc

        3.6.4) in green, the "bad" one (PETSc 3.10.2) in blue.<br>

      </p>

      <p>I used the commit d330a26 (3.11.1) for all the other scalings,

        adding different sets of options:</p>

      <p><i>Light blue</i> -> -matptap_via

        allatonce  -mat_freeintermediatedatastructures 1<br>

        <i>Orange</i> -> -matptap_via allatonce_<b>merged</b> -mat_freeintermediatedatastructures

        1<br>

        <i>Purple</i> -> -matptap_via

        allatonce  -mat_freeintermediatedatastructures 1 <b>-inner_diag_matmatmult_via

          scalable -inner_offdiag_matmatmult_via scalable</b><br>

        <i>Yellow</i>: -matptap_via allatonce_<b>merged</b> -mat_freeintermediatedatastructures

        1 <b>-inner_diag_matmatmult_via scalable

          -inner_offdiag_matmatmult_via scalable</b></p>

      <p>Conclusion: with regard to memory, the two algorithms imply a

        similarly good improvement of the scaling. The use of the

        -inner_(off)diag_matmatmult_via options is also very

        interesting. The scaling is still not as good as 3.6.4 though.<br>

        With regard to time, I noted a real improvement in time

        execution! I used to spend 200-300s on these executions. Now

        they take 10-15s. Beside that, the "_merged" versions are more

        efficient. And the -inner_(off)diaf_matmatmult_via options are

        slightly expensive but it is not critical.</p>

      <p>What do you think? Is it possible to match again the scaling of

        PETSc 3.6.4? Is it worthy keeping investigating?</p>

      <p>Myriam</p>

      <p><br>

      </p>

      <div class="gmail-m_3074498340659128874gmail-m_4480500180785847151moz-cite-prefix">Le 04/30/19 à 17:00, Fande Kong a

        écrit :<br>

      </div>

      <blockquote type="cite">

        <div dir="ltr">

          <div dir="ltr">

            <div dir="ltr">

              <div dir="ltr">HI Myriam,

                <div><br>

                </div>

                <div>We are interesting how the new algorithms perform.

                  So there are two new algorithms you could try.</div>

                <div><br>

                </div>

                <div>Algorithm 1:</div>

                <div><br>

                </div>

                <div>-matptap_via

                  allatonce  -mat_freeintermediatedatastructures 1<br>

                </div>

                <div><br>

                </div>

                <div>Algorithm 2:</div>

                <div><br>

                </div>

                <div>-matptap_via

                  allatonce_merged -mat_freeintermediatedatastructures 1<br>

                </div>

                <div><br>

                </div>

                <div><br>

                </div>

                <div>Note that you need to use the current petsc-master,

                  and also please put "-snes_view" in your script so

                  that we can confirm these options are actually get

                  set.</div>

                <div><br>

                </div>

                <div>Thanks,</div>

                <div><br>

                </div>

                <div>Fande,</div>

                <div><br>

                </div>

              </div>

            </div>

          </div>

        </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Tue, Apr 30, 2019 at 2:26

            AM Myriam Peyrounette via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div bgcolor="#FFFFFF">

              <p>Hi,</p>

              <p>that's really good news for us, thanks! I will plot

                again the memory scaling using these new options and let

                you know. Next week I hope.</p>

              <p>Before that, I just need to clarify the situation.

                Throughout our discussions, we mentionned a number of

                options concerning the scalability:</p>

              <p>-matptatp_via scalable<br>

                -inner_diag_matmatmult_via scalable<br>

                -inner_diag_matmatmult_via scalable<br>

                -mat_freeintermediatedatastructures <br>

                -matptap_via allatonce<br>

                -matptap_via allatonce_merged</p>

              <p>Which ones of them are compatible? Should I use all of

                them at the same time? Is there redundancy?<br>

              </p>

              <p>Thanks,</p>

              <p>Myriam<br>

              </p>

              <br>

              <div class="gmail-m_3074498340659128874gmail-m_4480500180785847151gmail-m_5004975596082747442moz-cite-prefix">Le

                04/25/19 à 21:47, Zhang, Hong a écrit :<br>

              </div>

              <blockquote type="cite">

                <div dir="ltr">

                  <div dir="ltr">

                    <div dir="ltr">

                      <div dir="ltr">

                        <div dir="ltr">Myriam:<br>

                        </div>

                        <div>Checking MatPtAP() in petsc-3.6.4, I

                          realized that it uses different algorithm than

                          petsc-10 and later versions. petsc-3.6 uses

                          out-product for C=P^T * AP, while petsc-3.10

                          uses local transpose of P. petsc-3.10

                          accelerates data accessing, but doubles the

                          memory of P. </div>

                        <div><br>

                        </div>

                        <div>Fande added two new implementations for

                          MatPtAP() to petsc-master which use much

                          smaller and scalable memories with slightly

                          higher computing time (faster than hypre

                          though). You may use these new implementations

                          if you have concern on memory scalability. The

                          option for these new implementation are: </div>

                        <div>-matptap_via allatonce<br>

                        </div>

                        <div>-matptap_via allatonce_merged<br>

                        </div>

                        <div><br>

                        </div>

                        <div>Hong</div>

                        <br>

                        <div class="gmail_quote">

                          <div dir="ltr" class="gmail_attr">On Mon, Apr

                            15, 2019 at 12:10 PM <a href="mailto:hzhang@mcs.anl.gov" target="_blank">

                              hzhang@mcs.anl.gov</a> <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>>

                            wrote:<br>

                          </div>

                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                            <div dir="ltr">

                              <div dir="ltr">Myriam:<br>

                              </div>

                              <div>Thank you very much for providing

                                these results!</div>

                              <div>I have put effort to accelerate

                                execution time and avoid using global

                                sizes in PtAP, for which the algorithm

                                of transpose of P_local and P_other

                                likely doubles the memory usage. I'll

                                try to investigate why it becomes

                                unscalable.</div>

                              <div>Hong</div>

                              <div class="gmail_quote">

                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                  <div bgcolor="#FFFFFF">

                                    <p>Hi,</p>

                                    <p>you'll find the new scaling

                                      attached (green line). I used the

                                      version 3.11 and the four

                                      scalability options :<br>

                                      -matptap_via scalable<br>

                                      -inner_diag_matmatmult_via

                                      scalable<br>

                                      -inner_offdiag_matmatmult_via

                                      scalable<br>

-mat_freeintermediatedatastructures</p>

                                    <p>The scaling is much better! The

                                      code even uses less memory for the

                                      smallest cases. There is still an

                                      increase for the larger one. <br>

                                    </p>

                                    <p>With regard to the time scaling,

                                      I used KSPView and LogView on the

                                      two previous scalings (blue and

                                      yellow lines) but not on the last

                                      one (green line). So we can't

                                      really compare them, am I right?

                                      However, we can see that the new

                                      time scaling looks quite good. It

                                      slightly increases from ~8s to

                                      ~27s. <br>

                                    </p>

                                    <p>Unfortunately, the computations

                                      are expensive so I would like to

                                      avoid re-run them if possible. How

                                      relevant would be a proper time

                                      scaling for you?  <br>

                                    </p>

                                    <p>Myriam<br>

                                    </p>

                                    <br>

                                    <div class="gmail-m_3074498340659128874gmail-m_4480500180785847151gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-cite-prefix">Le

                                      04/12/19 à 18:18, Zhang, Hong a

                                      écrit :<br>

                                    </div>

                                    <blockquote type="cite">

                                      <div dir="ltr">

                                        <div dir="ltr">Myriam :<br>

                                        </div>

                                        <div>Thanks for your effort. It

                                          will help us improve PETSc.</div>

                                        <div>Hong</div>

                                        <div><br>

                                        </div>

                                        <div class="gmail_quote">

                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Hi all,<br>

                                            <br>

                                            I used the wrong script,

                                            that's why it diverged...

                                            Sorry about that. <br>

                                            I tried again with the right

                                            script applied on a tiny

                                            problem (~200<br>

                                            elements). I can see a small

                                            difference in memory usage

                                            (gain ~ 1mB).<br>

                                            when adding the

                                            -mat_freeintermediatestructures

                                            option. I still have to<br>

                                            execute larger cases to plot

                                            the scaling. The

                                            supercomputer I am used to<br>

                                            run my jobs on is really

                                            busy at the moment so it

                                            takes a while. I hope<br>

                                            I'll send you the results on

                                            Monday.<br>

                                            <br>

                                            Thanks everyone,<br>

                                            <br>

                                            Myriam<br>

                                            <br>

                                            <br>

                                            Le 04/11/19 à 06:01, Jed

                                            Brown a écrit :<br>

                                            > "Zhang, Hong" <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>>

                                            writes:<br>

                                            ><br>

                                            >> Jed:<br>

                                            >>>> Myriam,<br>

                                            >>>> Thanks for

                                            the plot.

                                            '-mat_freeintermediatedatastructures'

                                            should not affect solution.

                                            It releases almost half of

                                            memory in C=PtAP if C is not

                                            reused.<br>

                                            >>> And yet if

                                            turning it on causes

                                            divergence, that would imply

                                            a bug.<br>

                                            >>> Hong, are you

                                            able to reproduce the

                                            experiment to see the memory<br>

                                            >>> scaling?<br>

                                            >> I like to test his

                                            code using an alcf machine,

                                            but my hands are full now.

                                            I'll try it as soon as I

                                            find time, hopefully next

                                            week.<br>

                                            > I have now compiled and

                                            run her code locally.<br>

                                            ><br>

                                            > Myriam, thanks for your

                                            last mail adding

                                            configuration and removing

                                            the<br>

                                            > MemManager.h

                                            dependency.  I ran with and

                                            without<br>

                                            >

                                            -mat_freeintermediatedatastructures

                                            and don't see a difference

                                            in<br>

                                            > convergence.  What

                                            commands did you run to

                                            observe that difference?<br>

                                            <br>

                                            -- <br>

                                            Myriam Peyrounette<br>

                                            CNRS/IDRIS - HLST<br>

                                            --<br>

                                            <br>

                                            <br>

                                          </blockquote>

                                        </div>

                                      </div>

                                    </blockquote>

                                    <br>

                                    <pre class="gmail-m_3074498340659128874gmail-m_4480500180785847151gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-signature" cols="72">-- 

Myriam Peyrounette

CNRS/IDRIS - HLST

--

</pre>

                                  </div>

                                </blockquote>

                              </div>

                            </div>

                          </blockquote>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </blockquote>

              <br>

              <pre class="gmail-m_3074498340659128874gmail-m_4480500180785847151gmail-m_5004975596082747442moz-signature" cols="72">-- 

Myriam Peyrounette

CNRS/IDRIS - HLST

--

</pre>

            </div>

          </blockquote>

        </div>

      </blockquote>

      <br>

      <pre class="gmail-m_3074498340659128874gmail-m_4480500180785847151moz-signature" cols="72">-- 

Myriam Peyrounette

CNRS/IDRIS - HLST

--

</pre>

    </blockquote>

    <br>

    <pre class="gmail-m_3074498340659128874gmail-m_4480500180785847151moz-signature" cols="72">-- 

Myriam Peyrounette

CNRS/IDRIS - HLST

--

</pre>

  </div>

</blockquote></div>

</blockquote></div></div></div></div></div>