[petsc-users] Do more solver iterations = more communication?

Fri Feb 19 11:42:19 CST 2016

> On Feb 19, 2016, at 11:18 AM, Justin Chang <jychang48 at gmail.com> wrote:
> 
> Thanks Barry,
> 
> 1) Attached is the full log_summary of a serial run I did with ILU. I noticed that the MPI reductions/messages happen mostly in the SNESFunction/JacobianEval routines. Am I right to assume that these occur because of the required calls to Vec/MatAssemblyBegin/End at the end? 

   Likely it is two places 1) communicating the ghost points needed to compute the function and Jacobian (usually this will have no reductions) and then yes 2) in the AssemblyBegin/End

> 
> 2) If I ran this program with at least 2 cores, will the other Vec and Mat functions have these MPI reductions/messages accumulated?

  Not sure what you mean by this. The logging does count MPI reductions and messages in any PETSc operations.

> 
> 3) I don't know what all is happening inside BoomerAMG or ML, but do these packages perform their own Mat and Vec operations? Because if they still in part use PETSc' Vec and Mat operations, we could still somewhat quantify the corresponding MPI metrics no?

   BoomerAMG does everything internally so we have no information. For ML, it sets up the algebraic multigrid preconditioner completely internally so we have no information on that but the actually multigrid iterations are done with PETSc matrices and vectors so we do have that information.

> 
> 4) Suppose I stick with one of these preconditioner packages (e.g., ML)  and solve the same problem with two different numerical methods.Is it more appropriate to infer that if both methods require the same amount of wall-clock time but one of them requires more iterations to achieve the solution, then it overall may have more communication and *might* have the tendency to not scale as well in the strong sense?

  The is determined by seeing how the iterations scale with the number of processes, not with the absolute number of iterations. So you need to run at at least three grid resolutions and get the number of iterations and see how they grow. For example with bJacobi + ILU the number of iterations will grow but with GAMG they will be the same or grow very slowly.  For a small number of processes bjacobi will be better than GAMG but for a large number of processes bJacobi will be far worse.

> 
> Thanks,
> Justin
> 
> 
> On Thu, Feb 18, 2016 at 4:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> > On Feb 18, 2016, at 1:56 PM, Justin Chang <jychang48 at gmail.com> wrote:
> >
> > Hi all,
> >
> > For a poisson problem with roughly 1 million dofs (using second-order elements), I solved the problem using two different solver/preconditioner combinations: CG/ILU and CG/GAMG.
> >
> > ILU takes roughly 82 solver iterations whereas with GAMG it takes 14 iterations (wall clock time is roughly 15 and 46 seconds respectively). I have seen from previous mailing threads that there is a strong correlation between solver iterations and communication (which could lead to less strong-scaling scalability). It makes sense to me if I strictly use one of these preconditioners to solve two different problems and compare the number of respective iterations, but what about solving the same problem with two different preconditioners?
> >
> > If GAMG takes 14 iterations whereas ILU takes 82 iterations, does this necessarily mean GAMG has less communication?
> 
>   No you can't say that at all. A single GAMG cycle will do more communication than a single block Jacobi cycle.
> 
> > I would think that the "bandwidth" that happens within a single GAMG iteration would be much greater than that within a single ILU iteration. Is there a way to officially determine this?
> >
> > I see from log_summary that we have this information:
> > MPI Messages:         5.000e+00      1.00000   5.000e+00  5.000e+00
> > MPI Message Lengths:  5.816e+07      1.00000   1.163e+07  5.816e+07
> > MPI Reductions:       2.000e+01      1.00000
> >
> > Can this information be used to determine the "bandwidth"?
> 
>    You can certainly use this data for each run to determine which algorithm is sending more messages, total length of messages is bigger etc. And if you divided by time it would tell the rate of communication for the different algorithms.
> 
> Note that counts of messages and lengths are also given in the detailed table for each operation.
> 
> There are also theoretical bounds on messages that can be derived for some iterations applied to some problems.
> 
> > If so, does PETSc have the ability to document this for other preconditioner packages like HYPRE's BoomerAMG or Trilinos' ML?
> 
>    No, because they don't log this information.
> >
> > Thanks,
> > Justin
> 
> 
> <logsummary.txt>