Sorry forgot one more comment.<div><br></div><div>So basically I am comparing a "good" TAO solver (though this can be debated) with a "so-so" KSP solver in CG/Jacobi. If this "good" TAO solver cannot beat the performance of the "so-so" KSP, is there really any need to include the performance of the "good"KSP if my objective is focused on TAO and my methodology?<br><br>On Sunday, June 7, 2015, Justin Chang <<a href="mailto:jychang48@gmail.com">jychang48@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Matt (Knepley),<div><br></div><div>I see what you're saying and it makes perfect sense. The point of my work isn't necessarily to compare CG/Jacobi with GAMG. Rather I am trying to compare both the numerical solution and the computational performance of my "correction" methodology (through optimization) with just solving the FEM problem normally. Of course this methodology is going to be more expensive but I think it would be nice to have some "benchmark" to compare against. I have examples that show where the parallel efficiency of TAO overtakes CG/Jacobi, and I also have the AI that shows how TAO is higher than CG/Jacobi and that both are invariant with respect to problem size. </div><div><br></div><div>I ran some (smaller) experiments with GAMG and have noticed problems in which GAMG wall-clock time is less than CG/Jacobi (though not by much). However the problem is that it seems I cannot compute the arithmetic intensity for GAMG.</div><div><br></div>The way I see it I have these three options:<div><br></div><div>1) Stick with what I have and acknowledge that GAMG can be better for larger problems. Since I have compared TAO with CG/Jacobi, somebody else can compare GAMG with CG/Jacobi.</div><div><br></div><div>2) Do strong scaling studies with GAMG and TAO and forget about the AI stuff. If I do this, then IMHO the paper will lose much of its flavor.</div><div><br></div><div>3) Use a different performance model that can be used to measure GAMG. I can only imagine that the complexity in applying any other model would proliferate for GAMG</div><div><br></div><div>4) Simply report FLOPS/s and the associated wall-clock times with respect to each solver. Yes this is easily gamed but I would think that this can at least tell you something (I.e., if this metric drops for a given problem size, it can be an indicator that the program is losing some efficiency)</div><div><br></div><div>Thoughts?</div><div><br></div><div>Justin<br><div><br>On Saturday, June 6, 2015, Matthew Knepley <<a href="javascript:_e(%7B%7D,'cvml','knepley@gmail.com');" target="_blank">knepley@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sat, Jun 6, 2015 at 4:29 AM, Justin Chang <span dir="ltr"><<a>jychang48@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Matt and Mark thank you guys for your responses. <br><br>The reason I brought up GAMG was because it seems to me that this is the preconditioner to use for elliptic problems. However, I am using CG/Jacobi for my larger problems and the solver converges (with -ksp_atol and -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, but significantly fewer solver iterations. <br><br>As I also kind of mentioned in another mail, the ultimate purpose is to compare how this "correction" methodology using the TAO solver (with bounded constraints) performs compared to the original methodology using the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi and they are roughly 0.3 and 0.2 respectively (do these sound about right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s over the AI*STREAMS BW is smaller, though I am not sure what conclusions to make of that. This was also partly why I wanted to see what kind of metrics another KSP solver/preconditioner produces. <div><br></div><div>Point being, if I were to draw such comparisons between TAO and KSP, would I get crucified if people find out I am using CG/Jacobi and not GAMG? </div></div></blockquote><div><br></div><div>Here is what someone like me reviewing your paper would say first. I can believe that a well-conditioned problem would</div><div>converge using CG/Jacobi. However, if the highest order derivative looks like the Laplacian, then the condition number of</div><div>the equations will be O(h^2), and even with CG it will be O(h), so the number of iterations should increase as the square root</div><div>of the problem size (in 2D), where GAMG should be constant. Thus at some size GAMG will be more efficient. I would want</div><div>to see where the crossover is for your problem. If you do not get the O(h) dependence, I would think that there is a problem</div><div>in the formulation.</div><div><br></div><div>  Thanks,</div><div><br></div><div>     Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Thanks,<br>Justin</div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams <span dir="ltr"><<a>mfadams@lbl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div></div></blockquote><div><br></div></span><div>The overwhleming cost of AMG is the Galerkin triple-product RAP.</div><div><br></div></div></div></div></blockquote><div> </div></span><div>That is overstating it a bit.  It can be if you have a hard 3D operator and coarsening slowly is best.</div><div><br></div><div>Rule of thumb is you spend 50% time is the solver and 50% in the setup, which is often mostly RAP (in 3D, 2D is much faster).  That way you are within 2x of optimal and it often works out that way anyway.</div><span><font color="#888888"><div><br></div><div>Mark </div></font></span></div></div></div>

</blockquote></div><br></div>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>

</div></div>

</blockquote></div></div>

</blockquote></div>