<div dir="ltr">Thanks <span class="" id=":hmc.1" tabindex="-1">Hong</span>, that sounds great.<div><br></div><div>I am weary of silent optimizations like you suggest but 2x is big! and failure is very bad. So I would vote for your suggestion.</div><div><br></div><div>ex56 is an elasticity problem. It would be nice, now that you have this experimental setup in place, to compare with <span class="" id=":hmc.5" tabindex="-1">hypre</span> on a 3D scalar problem. <span class="" id=":hmc.6" tabindex="-1">Hypre</span> might not spend much effort optimizing for block matrices. 3x better than hypre seems large to me.  I have to suspect that hypre does not exploit blocked matrices as much as we do.</div><div><br></div><div>Thanks again,</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 8, 2017 at 12:07 PM, Hong <span dir="ltr"><<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I conducted tests on MatMatMult() and MatPtAP() using petsc/src/ksp/ksp/<wbr>examples/tutorials/ex56.c (gamg) on a 8-core machine (petsc machine). The output file is attached.<div><br></div><div>Summary:</div><div>1) non-scalable MatMatMult() for mpiaij format is 2x faster than scalable version. The major difference between the two is dense-axpy vs. sparse-axpy.</div><div><br></div><div>Currently, we set non-scalable as default, which leads to problem when running large problems. </div><div>How about setting default as </div><div> - non-scalable for small to medium size matrices</div><div> - scalable for larger ones, e.g.</div><div><br></div><div><div>+    ierr = PetscOptionsEList("-<wbr>matmatmult_via","Algorithmic approach","MatMatMult",<wbr>algTypes,nalg,algTypes[1],&<wbr>alg,&flg);</div><div><br></div><div>+    if (!flg) { /* set default algorithm based on B->cmap->N */</div><div>+      PetscMPIInt size;</div><div>+      ierr = MPI_Comm_size(comm,&size);<wbr>CHKERRQ(ierr);</div><div>+      if ((PetscReal)(B->cmap->N)/size > 100000.0) alg = 0; /* scalable algorithm */</div><div>+    }</div></div><div><br></div><div>i.e., if user does NOT pick an algorithm, when ave cols per process > 100k, use scalable implementation; otherwise, non-scalable version.</div><div><br></div><div>2) We do NOT have scalable implementation for MatPtAP() yet.</div><div>We have non-scalable PtAP and interface to Hypre's PtAP. Comparing the two, </div><div>Petsc MatPtAP() is approx 3x faster than Hypre's.</div><div><br></div><div>I'm writing a scalable MatPtAP() now.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Hong</div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 2, 2017 at 2:54 PM, Stefano Zampini <span dir="ltr"><<a href="mailto:stefano.zampini@gmail.com" target="_blank">stefano.zampini@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto"><span><div><br><div class="gmail_extra"><br><div class="gmail_quote">Il 02 Feb 2017 23:43, "Mark Adams" <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> ha scritto:<br type="attribution"><blockquote class="m_-1745999122120468384m_-4687822509837486539quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><div class="m_-1745999122120468384m_-4687822509837486539quoted-text">On Thu, Feb 2, 2017 at 12:02 PM, Stefano Zampini <span dir="ltr"><<a href="mailto:stefano.zampini@gmail.com" target="_blank">stefano.zampini@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Mark,<div><br></div><div>I saw your configuration has hypre. If you could run with master, you may try -matptap_via hypre.</div></div></blockquote><div><br></div></div><div>This is worth trying. Does this even work with GAMG?</div><div></div></div></div></div></blockquote></div></div></div><div dir="auto"><br></div></span><div dir="auto">Yes, it should work, except that the block sizes, if any, are not propagated to the resulting matrix. I can add it if you need it.</div><span><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="m_-1745999122120468384m_-4687822509837486539quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>Treb: try hypre anyway. It has its own RAP code.</div><div class="m_-1745999122120468384m_-4687822509837486539quoted-text"><div> </div></div></div></div></div></blockquote></div></div></div><div dir="auto"><br></div></span><div dir="auto">With that option, you will use hypre's RAP with MATAIJ</div><span><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="m_-1745999122120468384m_-4687822509837486539quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div class="m_-1745999122120468384m_-4687822509837486539quoted-text"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>It uses BoomerAMGBuildCoarseOperator directly with the AIJ matrices.</div><span class="m_-1745999122120468384m_-4687822509837486539m_-5999659458277280005HOEnZb"><font color="#888888"><div><br></div><div>Stefano</div></font></span><div><div class="m_-1745999122120468384m_-4687822509837486539m_-5999659458277280005h5"><div><br></div><div><div><blockquote type="cite"><div>On Feb 2, 2017, at 7:28 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:</div><br class="m_-1745999122120468384m_-4687822509837486539m_-5999659458277280005m_3434414329832994466Apple-interchange-newline"><div><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 2, 2017 at 11:13 AM, Hong <span dir="ltr"><<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">Mark:</div><div class="gmail_quote"><span style="font-size:12.8px">Try '-matmatmult_via scalable' first. If this works, should we set it as default?</span></div></div></div></blockquote><div><br></div><div>If it is robust I would say yes unless it is noticeably slower (say >20%) small scale problems. </div><div><br></div></div></div></div>

</div></blockquote></div><br></div></div></div></div></blockquote></div></div><br></div></div>

</blockquote></div><br></div></div></span></div>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div>