<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Hong,<div><br></div><div>Thanks for your reply.</div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<div>

<div dir="ltr">

<div dir="ltr">Fande:<br>

</div>

<div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<div dir="ltr">

<div dir="ltr">

<div>Hong,</div>

<div>Thanks for your improvements on PtAP that is critical for MG-type algorithms. </div>

<br>

<div class="gmail_quote">

<div dir="ltr">On Wed, May 3, 2017 at 10:17 AM Hong <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>> wrote:<br>

</div>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<div dir="ltr">Mark,

<div>Below is the copy of my email sent to you on Feb 27:</div>

<div><br>

</div>

<div><span style="font-size:12.8px">I implemented scalable MatPtAP and did comparisons of three implementations using </span><span class="m_-3645982989765171943gmail-m_2149401880200494564gmail-m_-5234260791570328833gmail-m_3433343712918822110gmail-il" style="font-size:12.8px;background-color:rgb(255,255,255)">ex56</span><span style="font-size:12.8px">.c

 on alcf </span><span class="m_-3645982989765171943gmail-m_2149401880200494564gmail-m_-5234260791570328833gmail-m_3433343712918822110gmail-il" style="font-size:12.8px;background-color:rgb(255,255,255)">cetus</span><span style="font-size:12.8px"> machine (this machine has small memory, 1GB/core):</span>

<div style="font-size:12.8px">- nonscalable PtAP: use an array of length PN to do dense axpy</div>

<div style="font-size:12.8px">- scalable PtAP:       do sparse axpy without use of PN array</div>

</div>

</div>

</blockquote>

<div><br>

</div>

<div>What PN means here?</div>

</div>

</div>

</div>

</blockquote>

<div>Global number of columns of P. </div>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<div dir="ltr">

<div dir="ltr">

<div class="gmail_quote">

<div><br>

</div>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<div dir="ltr">

<div>

<div style="font-size:12.8px">- hypre PtAP.</div>

<div style="font-size:12.8px"><br>

</div>

<div style="font-size:12.8px">The results are attached. Summary:</div>

<div style="font-size:12.8px">- nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP</div>

<div style="font-size:12.8px">- scalable PtAP is 4x faster than hypre PtAP</div>

<div style="font-size:12.8px">- hypre uses less memory (see <a href="http://job.ne399.n63.np1000.sh/" target="_blank">job.ne399.n63.np1000.sh</a>)</div>

</div>

</div>

</blockquote>

<div><br>

</div>

<div>I was wondering how much more memory PETSc PtAP uses than hypre? I am implementing an AMG algorithm based on PETSc right now, and it is working well. But we find some a bottleneck with PtAP. For the same P and A, PETSc PtAP fails to generate a coarse matrix

 due to out of memory, while hypre still can generates the coarse matrix.</div>

<div><br>

</div>

<div>I do not want to just use the HYPRE one because we had to duplicate matrices if I used HYPRE PtAP.</div>

<div><br>

</div>

<div>It would be nice if you guys already have done some compassions on these implementations for the memory usage.</div>

</div>

</div>

</div>

</blockquote>

<div>Do you encounter memory issue with  <span style="font-size:12.8px">scalable PtAP? </span></div></div></div></div></blockquote><div><br></div><div>Yes, the code will crash within "MatPtAP(Mat A,Mat P,MatReuse scall,PetscReal fill,Mat *C)", and we receive signal 9  from the supercomputer. Signal 9 usually means out of memory.</div><div><br></div><div>If we decrease the problem size, the code runs just fine. BTW, we are trying to simulation a realistic case on the supercomputer with 10K cores for a problem with 4 billion unknowns.  </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div dir="ltr"><div class="gmail_quote"><div><span style="font-size:12.8px">Karl had a student in the summer who improved MatPtAP(). Do you use the latest version of petsc?</span></div></div></div></div></blockquote><div><br></div><div>I am using PETSc-3.9.4. I could upgrade to PETSc-3.10.x if necessary. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div dir="ltr"><div class="gmail_quote">

<div>HYPRE may use less memory than PETSc because it does not save and reuse the matrices.</div></div></div></div></blockquote><div><br></div><div>What "the matrices" are referred as to ?  Do you mean intermediate  matrices (tmp = PtA or tmp = AP)? Could we have an option to not save them? I am using SNES, but my preconditioning matrix is fixed and then the preconditioner is fixed during the entire simulation. There is no reason for me cache the intermediate matrices as they may take memory and I do not need to use them any more.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div dir="ltr"><div class="gmail_quote"><div><span style="font-size:12.8px"><br>

</span></div>

<div><br>

</div>

<div>I do not understand why generating coarse matrix fails due to out of memory. </div></div></div></div></blockquote><div><br></div><div>I mean we can not get  C= PtAP done because "PtAP" takes too much memory,  compared with HYPRE PtAP.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div dir="ltr"><div class="gmail_quote"><div>Do you use direct solver at coarse grid?<br></div></div></div></div></blockquote><div><br></div><div>We have a 10-level AMG, and the coarsest matrix is small (at most 100x100),  and we use LU.</div><div><br></div><div><br></div><div>Fande,</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div dir="ltr"><div class="gmail_quote"><div>

</div>

<div>Hong</div>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<div dir="ltr">

<div dir="ltr">

<div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<div dir="ltr">

<div>

<div style="font-size:12.8px"><br>

</div>

<div style="font-size:12.8px">Based on above observation, I set the default PtAP algorithm as 'nonscalable'. </div>

<div style="font-size:12.8px">When PN > local estimated nonzero of C=PtAP, then switch default to 'scalable'.</div>

<div style="font-size:12.8px">User can overwrite default.</div>

<div style="font-size:12.8px"><br>

</div>

<div style="font-size:12.8px">For the case of np=8000, ne=599 (see <a href="http://job.ne599.n500.np8000.sh/" target="_blank">job.ne599.n500.np8000.sh</a>), I get</div>

<div style="font-size:12.8px">MatPtAP                   3.6224e+01 (nonscalable for small mats, scalable for larger ones)<br>

</div>

<div style="font-size:12.8px">scalable MatPtAP     4.6129e+01<br>

</div>

<div style="font-size:12.8px">hypre                        1.9389e+02 </div>

</div>

<div style="font-size:12.8px"><br>

</div>

<div style="font-size:12.8px">This work in on petsc-master. Give it a try. If you encounter any problem, let me know.</div>

<div style="font-size:12.8px"><br>

</div>

<div style="font-size:12.8px">Hong</div>

</div>

<div class="gmail_extra"><br>

<div class="gmail_quote">On Wed, May 3, 2017 at 10:01 AM, Mark Adams <span dir="ltr">

<<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<div dir="ltr">(Hong), what is the current state of optimizing RAP for scaling?

<div><br>

</div>

<div>Nate, is driving 3D elasticity problems at scaling with GAMG and we are working out performance problems. They are hitting problems at ~1.5B dof problems on a basic Cray (XC30 I think).</div>

<div><br>

</div>

<div>Thanks,</div>

<div>Mark</div>

</div>

</blockquote>

</div>

<br>

</div>

</blockquote>

</div>

</div>

</div>

</blockquote>

</div>

</div>

</div>

</blockquote></div></div></div></div>