<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">HI Myriam,<div><br></div><div>We are interesting how the new algorithms perform. So there are two new algorithms you could try.</div><div><br></div><div>Algorithm 1:</div><div><br></div><div>-matptap_via allatonce -mat_freeintermediatedatastructures 1<br></div><div><br></div><div>Algorithm 2:</div><div><br></div><div>-matptap_via allatonce_merged -mat_freeintermediatedatastructures 1<br></div><div><br></div><div><br></div><div>Note that you need to use the current petsc-master, and also please put "-snes_view" in your script so that we can confirm these options are actually get set.</div><div><br></div><div>Thanks,</div><div><br></div><div>Fande,</div><div><br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 30, 2019 at 2:26 AM Myriam Peyrounette via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Hi,</p>
<p>that's really good news for us, thanks! I will plot again the
memory scaling using these new options and let you know. Next week
I hope.</p>
<p>Before that, I just need to clarify the situation. Throughout our
discussions, we mentionned a number of options concerning the
scalability:</p>
<p>-matptatp_via scalable<br>
-inner_diag_matmatmult_via scalable<br>
-inner_diag_matmatmult_via scalable<br>
-mat_freeintermediatedatastructures <br>
-matptap_via allatonce<br>
-matptap_via allatonce_merged</p>
<p>Which ones of them are compatible? Should I use all of them at
the same time? Is there redundancy?<br>
</p>
<p>Thanks,</p>
<p>Myriam<br>
</p>
<br>
<div class="gmail-m_5004975596082747442moz-cite-prefix">Le 04/25/19 à 21:47, Zhang, Hong a
écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">Myriam:<br>
</div>
<div>Checking MatPtAP() in petsc-3.6.4, I realized that it
uses different algorithm than petsc-10 and later
versions. petsc-3.6 uses out-product for C=P^T * AP,
while petsc-3.10 uses local transpose of P. petsc-3.10
accelerates data accessing, but doubles the memory of
P. </div>
<div><br>
</div>
<div>Fande added two new implementations for MatPtAP() to
petsc-master which use much smaller and scalable
memories with slightly higher computing time (faster
than hypre though). You may use these new
implementations if you have concern on memory
scalability. The option for these new implementation
are: </div>
<div>-matptap_via allatonce<br>
</div>
<div>-matptap_via allatonce_merged<br>
</div>
<div><br>
</div>
<div>Hong</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Apr 15, 2019
at 12:10 PM <a href="mailto:hzhang@mcs.anl.gov" target="_blank">
hzhang@mcs.anl.gov</a> <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">Myriam:<br>
</div>
<div>Thank you very much for providing these
results!</div>
<div>I have put effort to accelerate execution time
and avoid using global sizes in PtAP, for which
the algorithm of transpose of P_local and P_other
likely doubles the memory usage. I'll try to
investigate why it becomes unscalable.</div>
<div>Hong</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Hi,</p>
<p>you'll find the new scaling attached (green
line). I used the version 3.11 and the four
scalability options :<br>
-matptap_via scalable<br>
-inner_diag_matmatmult_via scalable<br>
-inner_offdiag_matmatmult_via scalable<br>
-mat_freeintermediatedatastructures</p>
<p>The scaling is much better! The code even
uses less memory for the smallest cases.
There is still an increase for the larger
one.
<br>
</p>
<p>With regard to the time scaling, I used
KSPView and LogView on the two previous
scalings (blue and yellow lines) but not on
the last one (green line). So we can't
really compare them, am I right? However, we
can see that the new time scaling looks
quite good. It slightly increases from ~8s
to ~27s. <br>
</p>
<p>Unfortunately, the computations are
expensive so I would like to avoid re-run
them if possible. How relevant would be a
proper time scaling for you?
<br>
</p>
<p>Myriam<br>
</p>
<br>
<div class="gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-cite-prefix">Le
04/12/19 à 18:18, Zhang, Hong a écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">Myriam :<br>
</div>
<div>Thanks for your effort. It will help
us improve PETSc.</div>
<div>Hong</div>
<div><br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Hi all,<br>
<br>
I used the wrong script, that's why it
diverged... Sorry about that. <br>
I tried again with the right script
applied on a tiny problem (~200<br>
elements). I can see a small
difference in memory usage (gain ~
1mB).<br>
when adding the
-mat_freeintermediatestructures
option. I still have to<br>
execute larger cases to plot the
scaling. The supercomputer I am used
to<br>
run my jobs on is really busy at the
moment so it takes a while. I hope<br>
I'll send you the results on Monday.<br>
<br>
Thanks everyone,<br>
<br>
Myriam<br>
<br>
<br>
Le 04/11/19 à 06:01, Jed Brown a
écrit :<br>
> "Zhang, Hong" <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>>
writes:<br>
><br>
>> Jed:<br>
>>>> Myriam,<br>
>>>> Thanks for the plot.
'-mat_freeintermediatedatastructures'
should not affect solution. It
releases almost half of memory in
C=PtAP if C is not reused.<br>
>>> And yet if turning it on
causes divergence, that would imply a
bug.<br>
>>> Hong, are you able to
reproduce the experiment to see the
memory<br>
>>> scaling?<br>
>> I like to test his code using
an alcf machine, but my hands are full
now. I'll try it as soon as I find
time, hopefully next week.<br>
> I have now compiled and run her
code locally.<br>
><br>
> Myriam, thanks for your last mail
adding configuration and removing the<br>
> MemManager.h dependency. I ran
with and without<br>
>
-mat_freeintermediatedatastructures
and don't see a difference in<br>
> convergence. What commands did
you run to observe that difference?<br>
<br>
-- <br>
Myriam Peyrounette<br>
CNRS/IDRIS - HLST<br>
--<br>
<br>
<br>
</blockquote>
</div>
</div>
</blockquote>
<br>
<pre class="gmail-m_5004975596082747442gmail-m_5870970137787136754gmail-m_4593329201565690262m_-4364359315279719822gmail-m_-6245019727744503832moz-signature" cols="72">--
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
<pre class="gmail-m_5004975596082747442moz-signature" cols="72">--
Myriam Peyrounette
CNRS/IDRIS - HLST
--
</pre>
</div>
</blockquote></div>