<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi,</p>
<p>I plotted new scalings (memory and time) using the new
algorithms. I used the options <i>-options_left true </i>to make
sure that the options are effectively used. They are. <br>
</p>
<p>I don't have access to the platform I used to run my computations
on, so I ran them on a different one. In particular, I can't reach
problem size = 1e8 and the values might be different from the
previous scalings I sent you. But the comparison of the PETSc
versions and options is still relevant. <br>
</p>
<p>I plotted the scalings of reference: the "good" one (PETSc 3.6.4)
in green, the "bad" one (PETSc 3.10.2) in blue.<br>
</p>
<p>I used the commit d330a26 (3.11.1) for all the other scalings,
adding different sets of options:</p>
<p><i>Light blue</i> -> -matptap_via
allatonce -mat_freeintermediatedatastructures 1<br>
<i>Orange</i> -> -matptap_via allatonce_<b>merged</b> -mat_freeintermediatedatastructures
1<br>
<i>Purple</i> -> -matptap_via
allatonce -mat_freeintermediatedatastructures 1 <b>-inner_diag_matmatmult_via
scalable -inner_offdiag_matmatmult_via scalable</b><br>
<i>Yellow</i>: -matptap_via allatonce_<b>merged</b> -mat_freeintermediatedatastructures
1 <b>-inner_diag_matmatmult_via scalable
-inner_offdiag_matmatmult_via scalable</b></p>
<p>Conclusion: with regard to memory, the two algorithms imply a
similarly good improvement of the scaling. The use of the
-inner_(off)diag_matmatmult_via options is also very interesting.
The scaling is still not as good as 3.6.4 though.<br>
With regard to time, I noted a real improvement in time execution!
I used to spend 200-300s on these executions. Now they take
10-15s. Beside that, the "_merged" versions are more efficient.
And the -inner_(off)diaf_matmatmult_via options are slightly
expensive but it is not critical.</p>
<p>What do you think? Is it possible to match again the scaling of
PETSc 3.6.4? Is it worthy keeping investigating?</p>
<p>Myriam</p>
<p><br>
</p>
<div class="moz-cite-prefix">Le 04/30/19 à 17:00, Fande Kong a
écrit :<br>
</div>
<blockquote type="cite"
cite="mid:CAN5Wd-JAacDFEWmxbJ1PFY0gQWU5NJweEF=ctk+J-eCDv6BViA@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">HI Myriam,
<div><br>
</div>
<div>We are interesting how the new algorithms perform. So
there are two new algorithms you could try.</div>
<div><br>
</div>
<div>Algorithm 1:</div>
<div><br>
</div>
<div>-matptap_via
allatonce -mat_freeintermediatedatastructures 1<br>
</div>
<div><br>
</div>
<div>Algorithm 2:</div>
<div><br>
</div>
<div>-matptap_via
allatonce_merged -mat_freeintermediatedatastructures 1<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Note that you need to use the current petsc-master,
and also please put "-snes_view" in your script so that
we can confirm these options are actually get set.</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div>Fande,</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Apr 30, 2019 at 2:26
AM Myriam Peyrounette via petsc-users <<a
href="mailto:petsc-users@mcs.anl.gov" moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Hi,</p>
<p>that's really good news for us, thanks! I will plot again
the memory scaling using these new options and let you
know. Next week I hope.</p>
<p>Before that, I just need to clarify the situation.
Throughout our discussions, we mentionned a number of
options concerning the scalability:</p>
<p>-matptatp_via scalable<br>
-inner_diag_matmatmult_via scalable<br>
-inner_diag_matmatmult_via scalable<br>
-mat_freeintermediatedatastructures <br>
-matptap_via allatonce<br>
-matptap_via allatonce_merged</p>
<p>Which ones of them are compatible? Should I use all of
them at the same time? Is there redundancy?<br>
</p>
<p>Thanks,</p>
<p>Myriam<br>
</p>
<br>
<div class="gmail-m_5004975596082747442moz-cite-prefix">Le
04/25/19 à 21:47, Zhang, Hong a écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">Myriam:<br>
</div>
<div>Checking MatPtAP() in petsc-3.6.4, I realized
that it uses different algorithm than petsc-10
and later versions. petsc-3.6 uses out-product
for C=P^T * AP, while petsc-3.10 uses local
transpose of P. petsc-3.10 accelerates data
accessing, but doubles the memory of P. </div>
<div><br>
</div>
<div>Fande added two new implementations for
MatPtAP() to petsc-master which use much smaller
and scalable memories with slightly higher
computing time (faster than hypre though). You
may use these new implementations if you have
concern on memory scalability. The option for
these new implementation are: </div>
<div>-matptap_via allatonce<br>
</div>
<div>-matptap_via allatonce_merged<br>
</div>
<div><br>
</div>
<div>Hong</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Apr
15, 2019 at 12:10 PM <a
href="mailto:hzhang@mcs.anl.gov"
target="_blank" moz-do-not-send="true">
hzhang@mcs.anl.gov</a> <<a
href="mailto:hzhang@mcs.anl.gov"
target="_blank" moz-do-not-send="true">hzhang@mcs.anl.gov</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">Myriam:<br>
</div>
<div>Thank you very much for providing these
results!</div>
<div>I have put effort to accelerate
execution time and avoid using global
sizes in PtAP, for which the algorithm of
transpose of P_local and P_other likely
doubles the memory usage. I'll try to
investigate why it becomes unscalable.</div>
<div>Hong</div>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Hi,</p>
<p>you'll find the new scaling
attached (green line). I used the
version 3.11 and the four
scalability options :<br>
-matptap_via scalable<br>
-inner_diag_matmatmult_via scalable<br>
-inner_offdiag_matmatmult_via
scalable<br>
-mat_freeintermediatedatastructures</p>
<p>The scaling is much better! The
code even uses less memory for the
smallest cases. There is still an
increase for the larger one. <br>
</p>
<p>With regard to the time scaling, I
used KSPView and LogView on the two
previous scalings (blue and yellow
lines) but not on the last one
(green line). So we can't really
compare them, am I right? However,
we can see that the new time scaling
looks quite good. It slightly
increases from ~8s to ~27s. <br>
</p>
<p>Unfortunately, the computations are
expensive so I would like to avoid
re-run them if possible. How
relevant would be a proper time
scaling for you? <br>
</p>
<p>Myriam<br>
</p>
<br>
<div
class="gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-cite-prefix">Le
04/12/19 à 18:18, Zhang, Hong a
écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">Myriam :<br>
</div>
<div>Thanks for your effort. It
will help us improve PETSc.</div>
<div>Hong</div>
<div><br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
Hi all,<br>
<br>
I used the wrong script,
that's why it diverged...
Sorry about that. <br>
I tried again with the right
script applied on a tiny
problem (~200<br>
elements). I can see a small
difference in memory usage
(gain ~ 1mB).<br>
when adding the
-mat_freeintermediatestructures
option. I still have to<br>
execute larger cases to plot
the scaling. The supercomputer
I am used to<br>
run my jobs on is really busy
at the moment so it takes a
while. I hope<br>
I'll send you the results on
Monday.<br>
<br>
Thanks everyone,<br>
<br>
Myriam<br>
<br>
<br>
Le 04/11/19 à 06:01, Jed Brown
a écrit :<br>
> "Zhang, Hong" <<a
href="mailto:hzhang@mcs.anl.gov"
target="_blank"
moz-do-not-send="true">hzhang@mcs.anl.gov</a>>
writes:<br>
><br>
>> Jed:<br>
>>>> Myriam,<br>
>>>> Thanks for
the plot.
'-mat_freeintermediatedatastructures'
should not affect solution. It
releases almost half of memory
in C=PtAP if C is not reused.<br>
>>> And yet if
turning it on causes
divergence, that would imply a
bug.<br>
>>> Hong, are you
able to reproduce the
experiment to see the memory<br>
>>> scaling?<br>
>> I like to test his
code using an alcf machine,
but my hands are full now.
I'll try it as soon as I find
time, hopefully next week.<br>
> I have now compiled and
run her code locally.<br>
><br>
> Myriam, thanks for your
last mail adding configuration
and removing the<br>
> MemManager.h dependency.
I ran with and without<br>
>
-mat_freeintermediatedatastructures
and don't see a difference in<br>
> convergence. What
commands did you run to
observe that difference?<br>
<br>
-- <br>
Myriam Peyrounette<br>
CNRS/IDRIS - HLST<br>
--<br>
<br>
<br>
</blockquote>
</div>
</div>
</blockquote>
<br>
<pre class="gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-signature" cols="72">--
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
<pre class="gmail-m_5004975596082747442moz-signature" cols="72">--
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
</div>
</blockquote>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
</body>
</html>