[petsc-users] Do more solver iterations = more communication?

Justin Chang jychang48 at gmail.com
Fri Feb 19 11:18:45 CST 2016


Thanks Barry,

1) Attached is the full log_summary of a serial run I did with ILU. I
noticed that the MPI reductions/messages happen mostly in the
SNESFunction/JacobianEval routines. Am I right to assume that these occur
because of the required calls to Vec/MatAssemblyBegin/End at the end?

2) If I ran this program with at least 2 cores, will the other Vec and Mat
functions have these MPI reductions/messages accumulated?

3) I don't know what all is happening inside BoomerAMG or ML, but do these
packages perform their own Mat and Vec operations? Because if they still in
part use PETSc' Vec and Mat operations, we could still somewhat quantify
the corresponding MPI metrics no?

4) Suppose I stick with one of these preconditioner packages (e.g., ML)
and solve the same problem with two different numerical methods.Is it more
appropriate to infer that if both methods require the same amount of
wall-clock time but one of them requires more iterations to achieve the
solution, then it overall may have more communication and *might* have the
tendency to not scale as well in the strong sense?

Thanks,
Justin


On Thu, Feb 18, 2016 at 4:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Feb 18, 2016, at 1:56 PM, Justin Chang <jychang48 at gmail.com> wrote:
> >
> > Hi all,
> >
> > For a poisson problem with roughly 1 million dofs (using second-order
> elements), I solved the problem using two different solver/preconditioner
> combinations: CG/ILU and CG/GAMG.
> >
> > ILU takes roughly 82 solver iterations whereas with GAMG it takes 14
> iterations (wall clock time is roughly 15 and 46 seconds respectively). I
> have seen from previous mailing threads that there is a strong correlation
> between solver iterations and communication (which could lead to less
> strong-scaling scalability). It makes sense to me if I strictly use one of
> these preconditioners to solve two different problems and compare the
> number of respective iterations, but what about solving the same problem
> with two different preconditioners?
> >
> > If GAMG takes 14 iterations whereas ILU takes 82 iterations, does this
> necessarily mean GAMG has less communication?
>
>   No you can't say that at all. A single GAMG cycle will do more
> communication than a single block Jacobi cycle.
>
> > I would think that the "bandwidth" that happens within a single GAMG
> iteration would be much greater than that within a single ILU iteration. Is
> there a way to officially determine this?
> >
> > I see from log_summary that we have this information:
> > MPI Messages:         5.000e+00      1.00000   5.000e+00  5.000e+00
> > MPI Message Lengths:  5.816e+07      1.00000   1.163e+07  5.816e+07
> > MPI Reductions:       2.000e+01      1.00000
> >
> > Can this information be used to determine the "bandwidth"?
>
>    You can certainly use this data for each run to determine which
> algorithm is sending more messages, total length of messages is bigger etc.
> And if you divided by time it would tell the rate of communication for the
> different algorithms.
>
> Note that counts of messages and lengths are also given in the detailed
> table for each operation.
>
> There are also theoretical bounds on messages that can be derived for some
> iterations applied to some problems.
>
> > If so, does PETSc have the ability to document this for other
> preconditioner packages like HYPRE's BoomerAMG or Trilinos' ML?
>
>    No, because they don't log this information.
> >
> > Thanks,
> > Justin
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160219/a40d711e/attachment-0001.html>
-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./main on a arch-linux2-c-opt named pacotaco-xps with 1 processor, by justin Thu Feb 11 10:37:52 2016
Using Petsc Development GIT revision: v3.6.3-2084-g01047ab  GIT Date: 2016-01-25 11:32:04 -0600

                         Max       Max/Min        Avg      Total 
Time (sec):           1.548e+02      1.00000   1.548e+02
Objects:              2.930e+02      1.00000   2.930e+02
Flops:                7.338e+10      1.00000   7.338e+10  7.338e+10
Flops/sec:            4.740e+08      1.00000   4.740e+08  4.740e+08
MPI Messages:         5.000e+00      1.00000   5.000e+00  5.000e+00
MPI Message Lengths:  5.816e+07      1.00000   1.163e+07  5.816e+07
MPI Reductions:       2.000e+01      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.5481e+02 100.0%  7.3382e+10 100.0%  5.000e+00 100.0%  1.163e+07      100.0%  2.000e+01 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

CreateMesh             1 1.0 3.2756e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 21  0  0  0100  21  0  0  0100     0
BuildTwoSided          5 1.0 6.1989e-05 1.0 0.00e+00 0.0 1.0e+00 4.0e+00 0.0e+00  0  0 20  0  0   0  0 20  0  0     0
VecTDot              984 1.0 1.7839e+00 1.0 4.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1  6  0  0  0   1  6  0  0  0  2291
VecNorm              493 1.0 8.9080e-01 1.0 2.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0  2299
VecCopy                3 1.0 8.8620e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                25 1.0 1.7894e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY              984 1.0 2.5989e+00 1.0 4.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  6  0  0  0   2  6  0  0  0  1573
VecAYPX              491 1.0 1.2707e+00 1.0 2.04e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0  1605
VecWAXPY               1 1.0 1.6074e-01 1.0 2.08e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    13
MatMult              492 1.0 2.1720e+01 1.0 2.92e+10 1.0 0.0e+00 0.0e+00 0.0e+00 14 40  0  0  0  14 40  0  0  0  1345
MatSolve             493 1.0 2.0103e+01 1.0 2.93e+10 1.0 0.0e+00 0.0e+00 0.0e+00 13 40  0  0  0  13 40  0  0  0  1457
MatLUFactorNum         1 1.0 4.6696e-01 1.0 2.85e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   610
MatILUFactorSym        1 1.0 6.9562e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 7.0739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 2.2361e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 2.4049e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 6.6127e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         1 1.0 2.5803e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMPlexInterp           3 1.0 5.2191e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
DMPlexStratify        11 1.0 7.3829e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
DMPlexPrealloc         1 1.0 1.3914e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
DMPlexResidualFE       1 1.0 1.0185e+01 1.0 6.41e+08 1.0 0.0e+00 0.0e+00 0.0e+00  7  1  0  0  0   7  1  0  0  0    63
DMPlexJacobianFE       1 1.0 3.5721e+01 1.0 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 23  2  0  0  0  23  2  0  0  0    36
SFSetGraph             6 1.0 2.2913e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFBcastBegin           8 1.0 6.4482e-01 1.0 0.00e+00 0.0 4.0e+00 1.0e+07 0.0e+00  0  0 80 71  0   0  0 80 71  0     0
SFBcastEnd             8 1.0 8.5786e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceBegin          1 1.0 6.2540e-03 1.0 0.00e+00 0.0 1.0e+00 1.7e+07 0.0e+00  0  0 20 29  0   0  0 20 29  0     0
SFReduceEnd            1 1.0 4.0581e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SNESFunctionEval       1 1.0 1.2475e+01 1.0 6.41e+08 1.0 2.0e+00 1.7e+07 0.0e+00  8  1 40 57  0   8  1 40 57  0    51
SNESJacobianEval       1 1.0 3.5764e+01 1.0 1.30e+09 1.0 3.0e+00 8.3e+06 0.0e+00 23  2 60 43  0  23  2 60 43  0    36
KSPSetUp               1 1.0 2.5737e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 4.9169e+01 1.0 7.11e+10 1.0 0.0e+00 0.0e+00 0.0e+00 32 97  0  0  0  32 97  0  0  0  1445
PCSetUp                1 1.0 6.7844e-01 1.0 2.85e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   420
PCApply              493 1.0 2.0104e+01 1.0 2.93e+10 1.0 0.0e+00 0.0e+00 0.0e+00 13 40  0  0  0  13 40  0  0  0  1457
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     4              2         1520     0.
              Object     7              7         4088     0.
           Container    15             15         8640     0.
              Vector    20             20   3266330608     0.
              Matrix     2              2    804148444     0.
    Distributed Mesh    29             29       135576     0.
    GraphPartitioner    11             11         6732     0.
Star Forest Bipartite Graph    62             62        50488     0.
     Discrete System    29             29        25056     0.
           Index Set    46             46    291423816     0.
   IS L to G Mapping     1              1      8701952     0.
             Section    56             54        36288     0.
                SNES     1              1         1340     0.
      SNESLineSearch     1              1         1000     0.
              DMSNES     1              1          672     0.
       Krylov Solver     1              1         1240     0.
      Preconditioner     1              1         1016     0.
        Linear Space     2              2         1296     0.
          Dual Space     2              2         1328     0.
            FE Space     2              2         1512     0.
========================================================================================================================
Average time to get PetscTime(): 0.
#PETSc Option Table entries:
-al 1
-am 0
-at 0.001
-bcloc 0,1,0,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1,0,1,0,1,0,0,0,1,0,1,1,1,0,1,0.45,0.55,0.45,0.55,0.45,0.55
-bcnum 7
-bcval 0,0,0,0,0,0,1
-dim 3
-dm_refine 1
-dt 0.001
-edges 3,3
-floc 0.25,0.75,0.25,0.75,0.25,0.75
-fnum 0
-ftime 0,99
-fval 1
-ksp_atol 1.0e-8
-ksp_max_it 50000
-ksp_rtol 1.0e-8
-ksp_type cg
-log_summary
-lower 0,0
-mat_petscspace_order 0
-mesh datafiles/cube_with_hole5_mesh.dat
-mu 1
-nonneg 0
-numsteps 0
-options_left 0
-pc_type ilu
-petscpartitioner_type parmetis
-progress 1
-simplex 1
-solution_petscspace_order 1
-tao_fatol 1e-8
-tao_frtol 1e-8
-tao_max_it 50000
-tao_type blmvm
-trans datafiles/cube_with_hole5_trans.dat
-upper 1,1
-vtuname figures/cube_with_hole_5
-vtuprint 0
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --download-chaco --download-ctetgen --download-fblaslapack --download-hdf5 --download-hypre --download-metis --download-ml --download-parmetis --download-triangle --with-debugging=0 --with-mpi-dir=/usr/lib/openmpi --with-papi=/usr/local --with-shared-libraries=1 --with-valgrind=1 COPTFLAGS=-O2 CXXOPTFLAGS=-O2 FOPTFLAGS=-O2 PETSC_ARCH=arch-linux2-c-opt
-----------------------------------------
Libraries compiled on Tue Jan 26 00:25:01 2016 on pacotaco-xps 
Machine characteristics: Linux-3.13.0-76-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /home/justin/Software/petsc-dev
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O2  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O2   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/justin/Software/petsc-dev/arch-linux2-c-opt/include -I/home/justin/Software/petsc-dev/include -I/home/justin/Software/petsc-dev/include -I/home/justin/Software/petsc-dev/arch-linux2-c-opt/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/justin/Software/petsc-dev/arch-linux2-c-opt/lib -L/home/justin/Software/petsc-dev/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/justin/Software/petsc-dev/arch-linux2-c-opt/lib -L/home/justin/Software/petsc-dev/arch-linux2-c-opt/lib -lml -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lHYPRE -lmpi_cxx -lstdc++ -lflapack -lfblas -lparmetis -lmetis -ltriangle -lX11 -lssl -lcrypto -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lctetgen -lchaco -lm -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl 
-----------------------------------------


More information about the petsc-users mailing list