<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">HI Myriam,<div><br></div><div>We are interesting how the new algorithms perform. So there are two new algorithms you could try.</div><div><br></div><div>Algorithm 1:</div><div><br></div><div>-matptap_via allatonce  -mat_freeintermediatedatastructures 1<br></div><div><br></div><div>Algorithm 2:</div><div><br></div><div>-matptap_via allatonce_merged -mat_freeintermediatedatastructures 1<br></div><div><br></div><div><br></div><div>Note that you need to use the current petsc-master, and also please put "-snes_view" in your script so that we can confirm these options are actually get set.</div><div><br></div><div>Thanks,</div><div><br></div><div>Fande,</div><div><br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 30, 2019 at 2:26 AM Myriam Peyrounette via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div bgcolor="#FFFFFF">

    <p>Hi,</p>

    <p>that's really good news for us, thanks! I will plot again the

      memory scaling using these new options and let you know. Next week

      I hope.</p>

    <p>Before that, I just need to clarify the situation. Throughout our

      discussions, we mentionned a number of options concerning the

      scalability:</p>

    <p>-matptatp_via scalable<br>

      -inner_diag_matmatmult_via scalable<br>

      -inner_diag_matmatmult_via scalable<br>

      -mat_freeintermediatedatastructures <br>

      -matptap_via allatonce<br>

      -matptap_via allatonce_merged</p>

    <p>Which ones of them are compatible? Should I use all of them at

      the same time? Is there redundancy?<br>

    </p>

    <p>Thanks,</p>

    <p>Myriam<br>

    </p>

    <br>

    <div class="gmail-m_5004975596082747442moz-cite-prefix">Le 04/25/19 à 21:47, Zhang, Hong a

      écrit :<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div dir="ltr">

          <div dir="ltr">

            <div dir="ltr">

              <div dir="ltr">Myriam:<br>

              </div>

              <div>Checking MatPtAP() in petsc-3.6.4, I realized that it

                uses different algorithm than petsc-10 and later

                versions. petsc-3.6 uses out-product for C=P^T * AP,

                while petsc-3.10 uses local transpose of P. petsc-3.10

                accelerates data accessing, but doubles the memory of

                P. </div>

              <div><br>

              </div>

              <div>Fande added two new implementations for MatPtAP() to

                petsc-master which use much smaller and scalable

                memories with slightly higher computing time (faster

                than hypre though). You may use these new

                implementations if you have concern on memory

                scalability. The option for these new implementation

                are: </div>

              <div>-matptap_via allatonce<br>

              </div>

              <div>-matptap_via allatonce_merged<br>

              </div>

              <div><br>

              </div>

              <div>Hong</div>

              <br>

              <div class="gmail_quote">

                <div dir="ltr" class="gmail_attr">On Mon, Apr 15, 2019

                  at 12:10 PM <a href="mailto:hzhang@mcs.anl.gov" target="_blank">

                    hzhang@mcs.anl.gov</a> <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>>

                  wrote:<br>

                </div>

                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                  <div dir="ltr">

                    <div dir="ltr">Myriam:<br>

                    </div>

                    <div>Thank you very much for providing these

                      results!</div>

                    <div>I have put effort to accelerate execution time

                      and avoid using global sizes in PtAP, for which

                      the algorithm of transpose of P_local and P_other

                      likely doubles the memory usage. I'll try to

                      investigate why it becomes unscalable.</div>

                    <div>Hong</div>

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                        <div bgcolor="#FFFFFF">

                          <p>Hi,</p>

                          <p>you'll find the new scaling attached (green

                            line). I used the version 3.11 and the four

                            scalability options :<br>

                            -matptap_via scalable<br>

                            -inner_diag_matmatmult_via scalable<br>

                            -inner_offdiag_matmatmult_via scalable<br>

                            -mat_freeintermediatedatastructures</p>

                          <p>The scaling is much better! The code even

                            uses less memory for the smallest cases.

                            There is still an increase for the larger

                            one.

                            <br>

                          </p>

                          <p>With regard to the time scaling, I used

                            KSPView and LogView on the two previous

                            scalings (blue and yellow lines) but not on

                            the last one (green line). So we can't

                            really compare them, am I right? However, we

                            can see that the new time scaling looks

                            quite good. It slightly increases from ~8s

                            to ~27s. <br>

                          </p>

                          <p>Unfortunately, the computations are

                            expensive so I would like to avoid re-run

                            them if possible. How relevant would be a

                            proper time scaling for you? 

                            <br>

                          </p>

                          <p>Myriam<br>

                          </p>

                          <br>

                          <div class="gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-cite-prefix">Le

                            04/12/19 à 18:18, Zhang, Hong a écrit :<br>

                          </div>

                          <blockquote type="cite">

                            <div dir="ltr">

                              <div dir="ltr">Myriam :<br>

                              </div>

                              <div>Thanks for your effort. It will help

                                us improve PETSc.</div>

                              <div>Hong</div>

                              <div><br>

                              </div>

                              <div class="gmail_quote">

                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                  Hi all,<br>

                                  <br>

                                  I used the wrong script, that's why it

                                  diverged... Sorry about that. <br>

                                  I tried again with the right script

                                  applied on a tiny problem (~200<br>

                                  elements). I can see a small

                                  difference in memory usage (gain ~

                                  1mB).<br>

                                  when adding the

                                  -mat_freeintermediatestructures

                                  option. I still have to<br>

                                  execute larger cases to plot the

                                  scaling. The supercomputer I am used

                                  to<br>

                                  run my jobs on is really busy at the

                                  moment so it takes a while. I hope<br>

                                  I'll send you the results on Monday.<br>

                                  <br>

                                  Thanks everyone,<br>

                                  <br>

                                  Myriam<br>

                                  <br>

                                  <br>

                                  Le 04/11/19 à 06:01, Jed Brown a

                                  écrit :<br>

                                  > "Zhang, Hong" <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>>

                                  writes:<br>

                                  ><br>

                                  >> Jed:<br>

                                  >>>> Myriam,<br>

                                  >>>> Thanks for the plot.

                                  '-mat_freeintermediatedatastructures'

                                  should not affect solution. It

                                  releases almost half of memory in

                                  C=PtAP if C is not reused.<br>

                                  >>> And yet if turning it on

                                  causes divergence, that would imply a

                                  bug.<br>

                                  >>> Hong, are you able to

                                  reproduce the experiment to see the

                                  memory<br>

                                  >>> scaling?<br>

                                  >> I like to test his code using

                                  an alcf machine, but my hands are full

                                  now. I'll try it as soon as I find

                                  time, hopefully next week.<br>

                                  > I have now compiled and run her

                                  code locally.<br>

                                  ><br>

                                  > Myriam, thanks for your last mail

                                  adding configuration and removing the<br>

                                  > MemManager.h dependency.  I ran

                                  with and without<br>

                                  >

                                  -mat_freeintermediatedatastructures

                                  and don't see a difference in<br>

                                  > convergence.  What commands did

                                  you run to observe that difference?<br>

                                  <br>

                                  -- <br>

                                  Myriam Peyrounette<br>

                                  CNRS/IDRIS - HLST<br>

                                  --<br>

                                  <br>

                                  <br>

                                </blockquote>

                              </div>

                            </div>

                          </blockquote>

                          <br>

                          <pre class="gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-signature" cols="72">-- 

Myriam Peyrounette

CNRS/IDRIS - HLST

--

</pre>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </blockquote>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    <pre class="gmail-m_5004975596082747442moz-signature" cols="72">-- 

Myriam Peyrounette

CNRS/IDRIS - HLST

--

</pre>

  </div>

</blockquote></div>