<div dir="ltr">Thanks Barry,<div><br></div><div>1) Attached is the full log_summary of a serial run I did with ILU. I noticed that the MPI reductions/messages happen mostly in the SNESFunction/JacobianEval routines. Am I right to assume that these occur because of the required calls to Vec/MatAssemblyBegin/End at the end? </div><div><br></div><div>2) If I ran this program with at least 2 cores, will the other Vec and Mat functions have these MPI reductions/messages accumulated?</div><div><br></div><div>3) I don't know what all is happening inside BoomerAMG or ML, but do these packages perform their own Mat and Vec operations? Because if they still in part use PETSc' Vec and Mat operations, we could still somewhat quantify the corresponding MPI metrics no?</div><div><br></div><div class="gmail_extra">4) Suppose I stick with one of these preconditioner packages (e.g., ML)  and solve the same problem with two different numerical methods.Is it more appropriate to infer that if both methods require the same amount of wall-clock time but one of them requires more iterations to achieve the solution, then it overall may have more communication and *might* have the tendency to not scale as well in the strong sense?<br><br></div><div class="gmail_extra">Thanks,<br></div><div class="gmail_extra">Justin<br></div><div class="gmail_extra"><br><br></div><div class="gmail_extra"><div class="gmail_quote">On Thu, Feb 18, 2016 at 4:05 PM, Barry Smith <span dir="ltr"><<a>bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><br>

> On Feb 18, 2016, at 1:56 PM, Justin Chang <<a>jychang48@gmail.com</a>> wrote:<br>

><br>

> Hi all,<br>

><br>

> For a poisson problem with roughly 1 million dofs (using second-order elements), I solved the problem using two different solver/preconditioner combinations: CG/ILU and CG/GAMG.<br>

><br>

> ILU takes roughly 82 solver iterations whereas with GAMG it takes 14 iterations (wall clock time is roughly 15 and 46 seconds respectively). I have seen from previous mailing threads that there is a strong correlation between solver iterations and communication (which could lead to less strong-scaling scalability). It makes sense to me if I strictly use one of these preconditioners to solve two different problems and compare the number of respective iterations, but what about solving the same problem with two different preconditioners?<br>

><br>

> If GAMG takes 14 iterations whereas ILU takes 82 iterations, does this necessarily mean GAMG has less communication?<br>

<br>

</span>  No you can't say that at all. A single GAMG cycle will do more communication than a single block Jacobi cycle.<br>

<span><br>

> I would think that the "bandwidth" that happens within a single GAMG iteration would be much greater than that within a single ILU iteration. Is there a way to officially determine this?<br>

><br>

> I see from log_summary that we have this information:<br>

> MPI Messages:         5.000e+00      1.00000   5.000e+00  5.000e+00<br>

> MPI Message Lengths:  5.816e+07      1.00000   1.163e+07  5.816e+07<br>

> MPI Reductions:       2.000e+01      1.00000<br>

><br>

> Can this information be used to determine the "bandwidth"?<br>

<br>

</span>   You can certainly use this data for each run to determine which algorithm is sending more messages, total length of messages is bigger etc. And if you divided by time it would tell the rate of communication for the different algorithms.<br>

<br>

Note that counts of messages and lengths are also given in the detailed table for each operation.<br>

<br>

There are also theoretical bounds on messages that can be derived for some iterations applied to some problems.<br>

<span><br>

> If so, does PETSc have the ability to document this for other preconditioner packages like HYPRE's BoomerAMG or Trilinos' ML?<br>

<br>

</span>   No, because they don't log this information.<br>

><br>

> Thanks,<br>

> Justin<br>

<br>

</blockquote></div><br></div></div>