<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>I have some data from my own simulations. The results do not look bad. </div><div><br></div><div>The following are results (strong scaling) of "-matptap_via allatonce -mat_freeintermediatedatastructures 1"</div><div><br></div><div>Problem 1 has 2,482,224,480 unknowns, and use 4000, 6000, 10000, and 12000 processor cores.</div><div><br></div><div>4000 processor cores: 587M</div><div>6000 processor cores: 270M</div><div>10000 processor cores: 251M</div><div>12000 processor cores: 136M</div><div dir="ltr"><br></div><div>Problem 2 has 7,446,673,440 unknowns, and use 6000, 10000, and 12000 process cores:</div><div><div>6000 processor cores: 975M</div><div>10000 processor cores: 599M</div><div>12000 processor cores: 415M</div></div><div><br></div><div>The memory is used for PtAP only, and I do not include the memory from the other part of the simulation.</div><div><br></div><div>I am sorry we did not resolve the issue for you so far. I will try to run your example you attached earlier to if we can reproduce it. If we can reproduce the problem, I will use a memory profiling tool to check where the memory comes from.</div><div><br></div><div>Thanks again for your report,</div><div><br></div><div>Fande,</div><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 3, 2019 at 9:26 AM Fande Kong <<a href="mailto:fdkong.jd@gmail.com">fdkong.jd@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Thanks for your plots. <div><br></div><div>The new algorithms should be scalable in terms of the memory usage. I am puzzled by these plots since the memory usage increases exponentially. It may come from somewhere else? How do you measure the memory? The memory is for the entire simulation or just PtAP? Could you measure the memory for PtAP only? Maybe several factors affect the memory usage not only PtAP. </div><div><br></div><div> I will grab some data from my own simulations. </div><div><br></div><div>Are you running ex43?</div><div><br></div><div>Fande,</div><div><br></div><div><br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 3, 2019 at 8:14 AM Myriam Peyrounette <<a href="mailto:myriam.peyrounette@idris.fr" target="_blank">myriam.peyrounette@idris.fr</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>And the attached files... Sorry<br>
</p>
<br>
<div class="gmail-m_3074498340659128874gmail-m_4480500180785847151moz-cite-prefix">Le 05/03/19 à 16:11, Myriam Peyrounette
a écrit :<br>
</div>
<blockquote type="cite">
<p>Hi,</p>
<p>I plotted new scalings (memory and time) using the new
algorithms. I used the options <i>-options_left true </i>to
make sure that the options are effectively used. They are. <br>
</p>
<p>I don't have access to the platform I used to run my
computations on, so I ran them on a different one. In
particular, I can't reach problem size = 1e8 and the values
might be different from the previous scalings I sent you. But
the comparison of the PETSc versions and options is still
relevant. <br>
</p>
<p>I plotted the scalings of reference: the "good" one (PETSc
3.6.4) in green, the "bad" one (PETSc 3.10.2) in blue.<br>
</p>
<p>I used the commit d330a26 (3.11.1) for all the other scalings,
adding different sets of options:</p>
<p><i>Light blue</i> -> -matptap_via
allatonce -mat_freeintermediatedatastructures 1<br>
<i>Orange</i> -> -matptap_via allatonce_<b>merged</b> -mat_freeintermediatedatastructures
1<br>
<i>Purple</i> -> -matptap_via
allatonce -mat_freeintermediatedatastructures 1 <b>-inner_diag_matmatmult_via
scalable -inner_offdiag_matmatmult_via scalable</b><br>
<i>Yellow</i>: -matptap_via allatonce_<b>merged</b> -mat_freeintermediatedatastructures
1 <b>-inner_diag_matmatmult_via scalable
-inner_offdiag_matmatmult_via scalable</b></p>
<p>Conclusion: with regard to memory, the two algorithms imply a
similarly good improvement of the scaling. The use of the
-inner_(off)diag_matmatmult_via options is also very
interesting. The scaling is still not as good as 3.6.4 though.<br>
With regard to time, I noted a real improvement in time
execution! I used to spend 200-300s on these executions. Now
they take 10-15s. Beside that, the "_merged" versions are more
efficient. And the -inner_(off)diaf_matmatmult_via options are
slightly expensive but it is not critical.</p>
<p>What do you think? Is it possible to match again the scaling of
PETSc 3.6.4? Is it worthy keeping investigating?</p>
<p>Myriam</p>
<p><br>
</p>
<div class="gmail-m_3074498340659128874gmail-m_4480500180785847151moz-cite-prefix">Le 04/30/19 à 17:00, Fande Kong a
écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">HI Myriam,
<div><br>
</div>
<div>We are interesting how the new algorithms perform.
So there are two new algorithms you could try.</div>
<div><br>
</div>
<div>Algorithm 1:</div>
<div><br>
</div>
<div>-matptap_via
allatonce -mat_freeintermediatedatastructures 1<br>
</div>
<div><br>
</div>
<div>Algorithm 2:</div>
<div><br>
</div>
<div>-matptap_via
allatonce_merged -mat_freeintermediatedatastructures 1<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Note that you need to use the current petsc-master,
and also please put "-snes_view" in your script so
that we can confirm these options are actually get
set.</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div>Fande,</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Apr 30, 2019 at 2:26
AM Myriam Peyrounette via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Hi,</p>
<p>that's really good news for us, thanks! I will plot
again the memory scaling using these new options and let
you know. Next week I hope.</p>
<p>Before that, I just need to clarify the situation.
Throughout our discussions, we mentionned a number of
options concerning the scalability:</p>
<p>-matptatp_via scalable<br>
-inner_diag_matmatmult_via scalable<br>
-inner_diag_matmatmult_via scalable<br>
-mat_freeintermediatedatastructures <br>
-matptap_via allatonce<br>
-matptap_via allatonce_merged</p>
<p>Which ones of them are compatible? Should I use all of
them at the same time? Is there redundancy?<br>
</p>
<p>Thanks,</p>
<p>Myriam<br>
</p>
<br>
<div class="gmail-m_3074498340659128874gmail-m_4480500180785847151gmail-m_5004975596082747442moz-cite-prefix">Le
04/25/19 à 21:47, Zhang, Hong a écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">Myriam:<br>
</div>
<div>Checking MatPtAP() in petsc-3.6.4, I
realized that it uses different algorithm than
petsc-10 and later versions. petsc-3.6 uses
out-product for C=P^T * AP, while petsc-3.10
uses local transpose of P. petsc-3.10
accelerates data accessing, but doubles the
memory of P. </div>
<div><br>
</div>
<div>Fande added two new implementations for
MatPtAP() to petsc-master which use much
smaller and scalable memories with slightly
higher computing time (faster than hypre
though). You may use these new implementations
if you have concern on memory scalability. The
option for these new implementation are: </div>
<div>-matptap_via allatonce<br>
</div>
<div>-matptap_via allatonce_merged<br>
</div>
<div><br>
</div>
<div>Hong</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Apr
15, 2019 at 12:10 PM <a href="mailto:hzhang@mcs.anl.gov" target="_blank">
hzhang@mcs.anl.gov</a> <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">Myriam:<br>
</div>
<div>Thank you very much for providing
these results!</div>
<div>I have put effort to accelerate
execution time and avoid using global
sizes in PtAP, for which the algorithm
of transpose of P_local and P_other
likely doubles the memory usage. I'll
try to investigate why it becomes
unscalable.</div>
<div>Hong</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Hi,</p>
<p>you'll find the new scaling
attached (green line). I used the
version 3.11 and the four
scalability options :<br>
-matptap_via scalable<br>
-inner_diag_matmatmult_via
scalable<br>
-inner_offdiag_matmatmult_via
scalable<br>
-mat_freeintermediatedatastructures</p>
<p>The scaling is much better! The
code even uses less memory for the
smallest cases. There is still an
increase for the larger one. <br>
</p>
<p>With regard to the time scaling,
I used KSPView and LogView on the
two previous scalings (blue and
yellow lines) but not on the last
one (green line). So we can't
really compare them, am I right?
However, we can see that the new
time scaling looks quite good. It
slightly increases from ~8s to
~27s. <br>
</p>
<p>Unfortunately, the computations
are expensive so I would like to
avoid re-run them if possible. How
relevant would be a proper time
scaling for you? <br>
</p>
<p>Myriam<br>
</p>
<br>
<div class="gmail-m_3074498340659128874gmail-m_4480500180785847151gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-cite-prefix">Le
04/12/19 à 18:18, Zhang, Hong a
écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">Myriam :<br>
</div>
<div>Thanks for your effort. It
will help us improve PETSc.</div>
<div>Hong</div>
<div><br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Hi all,<br>
<br>
I used the wrong script,
that's why it diverged...
Sorry about that. <br>
I tried again with the right
script applied on a tiny
problem (~200<br>
elements). I can see a small
difference in memory usage
(gain ~ 1mB).<br>
when adding the
-mat_freeintermediatestructures
option. I still have to<br>
execute larger cases to plot
the scaling. The
supercomputer I am used to<br>
run my jobs on is really
busy at the moment so it
takes a while. I hope<br>
I'll send you the results on
Monday.<br>
<br>
Thanks everyone,<br>
<br>
Myriam<br>
<br>
<br>
Le 04/11/19 à 06:01, Jed
Brown a écrit :<br>
> "Zhang, Hong" <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>>
writes:<br>
><br>
>> Jed:<br>
>>>> Myriam,<br>
>>>> Thanks for
the plot.
'-mat_freeintermediatedatastructures'
should not affect solution.
It releases almost half of
memory in C=PtAP if C is not
reused.<br>
>>> And yet if
turning it on causes
divergence, that would imply
a bug.<br>
>>> Hong, are you
able to reproduce the
experiment to see the memory<br>
>>> scaling?<br>
>> I like to test his
code using an alcf machine,
but my hands are full now.
I'll try it as soon as I
find time, hopefully next
week.<br>
> I have now compiled and
run her code locally.<br>
><br>
> Myriam, thanks for your
last mail adding
configuration and removing
the<br>
> MemManager.h
dependency. I ran with and
without<br>
>
-mat_freeintermediatedatastructures
and don't see a difference
in<br>
> convergence. What
commands did you run to
observe that difference?<br>
<br>
-- <br>
Myriam Peyrounette<br>
CNRS/IDRIS - HLST<br>
--<br>
<br>
<br>
</blockquote>
</div>
</div>
</blockquote>
<br>
<pre class="gmail-m_3074498340659128874gmail-m_4480500180785847151gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-signature" cols="72">--
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
<pre class="gmail-m_3074498340659128874gmail-m_4480500180785847151gmail-m_5004975596082747442moz-signature" cols="72">--
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
</div>
</blockquote>
</div>
</blockquote>
<br>
<pre class="gmail-m_3074498340659128874gmail-m_4480500180785847151moz-signature" cols="72">--
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
</blockquote>
<br>
<pre class="gmail-m_3074498340659128874gmail-m_4480500180785847151moz-signature" cols="72">--
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
</div>
</blockquote></div>
</blockquote></div></div></div></div></div>