[petsc-users] Do more solver iterations = more communication?
Justin Chang
jychang48 at gmail.com
Fri Feb 19 11:18:45 CST 2016
Thanks Barry,
1) Attached is the full log_summary of a serial run I did with ILU. I
noticed that the MPI reductions/messages happen mostly in the
SNESFunction/JacobianEval routines. Am I right to assume that these occur
because of the required calls to Vec/MatAssemblyBegin/End at the end?
2) If I ran this program with at least 2 cores, will the other Vec and Mat
functions have these MPI reductions/messages accumulated?
3) I don't know what all is happening inside BoomerAMG or ML, but do these
packages perform their own Mat and Vec operations? Because if they still in
part use PETSc' Vec and Mat operations, we could still somewhat quantify
the corresponding MPI metrics no?
4) Suppose I stick with one of these preconditioner packages (e.g., ML)
and solve the same problem with two different numerical methods.Is it more
appropriate to infer that if both methods require the same amount of
wall-clock time but one of them requires more iterations to achieve the
solution, then it overall may have more communication and *might* have the
tendency to not scale as well in the strong sense?
Thanks,
Justin
On Thu, Feb 18, 2016 at 4:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> > On Feb 18, 2016, at 1:56 PM, Justin Chang <jychang48 at gmail.com> wrote:
> >
> > Hi all,
> >
> > For a poisson problem with roughly 1 million dofs (using second-order
> elements), I solved the problem using two different solver/preconditioner
> combinations: CG/ILU and CG/GAMG.
> >
> > ILU takes roughly 82 solver iterations whereas with GAMG it takes 14
> iterations (wall clock time is roughly 15 and 46 seconds respectively). I
> have seen from previous mailing threads that there is a strong correlation
> between solver iterations and communication (which could lead to less
> strong-scaling scalability). It makes sense to me if I strictly use one of
> these preconditioners to solve two different problems and compare the
> number of respective iterations, but what about solving the same problem
> with two different preconditioners?
> >
> > If GAMG takes 14 iterations whereas ILU takes 82 iterations, does this
> necessarily mean GAMG has less communication?
>
> No you can't say that at all. A single GAMG cycle will do more
> communication than a single block Jacobi cycle.
>
> > I would think that the "bandwidth" that happens within a single GAMG
> iteration would be much greater than that within a single ILU iteration. Is
> there a way to officially determine this?
> >
> > I see from log_summary that we have this information:
> > MPI Messages: 5.000e+00 1.00000 5.000e+00 5.000e+00
> > MPI Message Lengths: 5.816e+07 1.00000 1.163e+07 5.816e+07
> > MPI Reductions: 2.000e+01 1.00000
> >
> > Can this information be used to determine the "bandwidth"?
>
> You can certainly use this data for each run to determine which
> algorithm is sending more messages, total length of messages is bigger etc.
> And if you divided by time it would tell the rate of communication for the
> different algorithms.
>
> Note that counts of messages and lengths are also given in the detailed
> table for each operation.
>
> There are also theoretical bounds on messages that can be derived for some
> iterations applied to some problems.
>
> > If so, does PETSc have the ability to document this for other
> preconditioner packages like HYPRE's BoomerAMG or Trilinos' ML?
>
> No, because they don't log this information.
> >
> > Thanks,
> > Justin
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160219/a40d711e/attachment-0001.html>
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./main on a arch-linux2-c-opt named pacotaco-xps with 1 processor, by justin Thu Feb 11 10:37:52 2016
Using Petsc Development GIT revision: v3.6.3-2084-g01047ab GIT Date: 2016-01-25 11:32:04 -0600
Max Max/Min Avg Total
Time (sec): 1.548e+02 1.00000 1.548e+02
Objects: 2.930e+02 1.00000 2.930e+02
Flops: 7.338e+10 1.00000 7.338e+10 7.338e+10
Flops/sec: 4.740e+08 1.00000 4.740e+08 4.740e+08
MPI Messages: 5.000e+00 1.00000 5.000e+00 5.000e+00
MPI Message Lengths: 5.816e+07 1.00000 1.163e+07 5.816e+07
MPI Reductions: 2.000e+01 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.5481e+02 100.0% 7.3382e+10 100.0% 5.000e+00 100.0% 1.163e+07 100.0% 2.000e+01 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
CreateMesh 1 1.0 3.2756e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 21 0 0 0100 21 0 0 0100 0
BuildTwoSided 5 1.0 6.1989e-05 1.0 0.00e+00 0.0 1.0e+00 4.0e+00 0.0e+00 0 0 20 0 0 0 0 20 0 0 0
VecTDot 984 1.0 1.7839e+00 1.0 4.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2291
VecNorm 493 1.0 8.9080e-01 1.0 2.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2299
VecCopy 3 1.0 8.8620e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 25 1.0 1.7894e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 984 1.0 2.5989e+00 1.0 4.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 6 0 0 0 2 6 0 0 0 1573
VecAYPX 491 1.0 1.2707e+00 1.0 2.04e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 1605
VecWAXPY 1 1.0 1.6074e-01 1.0 2.08e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 13
MatMult 492 1.0 2.1720e+01 1.0 2.92e+10 1.0 0.0e+00 0.0e+00 0.0e+00 14 40 0 0 0 14 40 0 0 0 1345
MatSolve 493 1.0 2.0103e+01 1.0 2.93e+10 1.0 0.0e+00 0.0e+00 0.0e+00 13 40 0 0 0 13 40 0 0 0 1457
MatLUFactorNum 1 1.0 4.6696e-01 1.0 2.85e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 610
MatILUFactorSym 1 1.0 6.9562e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 7.0739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 2.2361e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 2.4049e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 6.6127e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 1 1.0 2.5803e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMPlexInterp 3 1.0 5.2191e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
DMPlexStratify 11 1.0 7.3829e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0
DMPlexPrealloc 1 1.0 1.3914e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0
DMPlexResidualFE 1 1.0 1.0185e+01 1.0 6.41e+08 1.0 0.0e+00 0.0e+00 0.0e+00 7 1 0 0 0 7 1 0 0 0 63
DMPlexJacobianFE 1 1.0 3.5721e+01 1.0 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 23 2 0 0 0 23 2 0 0 0 36
SFSetGraph 6 1.0 2.2913e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFBcastBegin 8 1.0 6.4482e-01 1.0 0.00e+00 0.0 4.0e+00 1.0e+07 0.0e+00 0 0 80 71 0 0 0 80 71 0 0
SFBcastEnd 8 1.0 8.5786e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 1 1.0 6.2540e-03 1.0 0.00e+00 0.0 1.0e+00 1.7e+07 0.0e+00 0 0 20 29 0 0 0 20 29 0 0
SFReduceEnd 1 1.0 4.0581e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESFunctionEval 1 1.0 1.2475e+01 1.0 6.41e+08 1.0 2.0e+00 1.7e+07 0.0e+00 8 1 40 57 0 8 1 40 57 0 51
SNESJacobianEval 1 1.0 3.5764e+01 1.0 1.30e+09 1.0 3.0e+00 8.3e+06 0.0e+00 23 2 60 43 0 23 2 60 43 0 36
KSPSetUp 1 1.0 2.5737e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 4.9169e+01 1.0 7.11e+10 1.0 0.0e+00 0.0e+00 0.0e+00 32 97 0 0 0 32 97 0 0 0 1445
PCSetUp 1 1.0 6.7844e-01 1.0 2.85e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 420
PCApply 493 1.0 2.0104e+01 1.0 2.93e+10 1.0 0.0e+00 0.0e+00 0.0e+00 13 40 0 0 0 13 40 0 0 0 1457
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 4 2 1520 0.
Object 7 7 4088 0.
Container 15 15 8640 0.
Vector 20 20 3266330608 0.
Matrix 2 2 804148444 0.
Distributed Mesh 29 29 135576 0.
GraphPartitioner 11 11 6732 0.
Star Forest Bipartite Graph 62 62 50488 0.
Discrete System 29 29 25056 0.
Index Set 46 46 291423816 0.
IS L to G Mapping 1 1 8701952 0.
Section 56 54 36288 0.
SNES 1 1 1340 0.
SNESLineSearch 1 1 1000 0.
DMSNES 1 1 672 0.
Krylov Solver 1 1 1240 0.
Preconditioner 1 1 1016 0.
Linear Space 2 2 1296 0.
Dual Space 2 2 1328 0.
FE Space 2 2 1512 0.
========================================================================================================================
Average time to get PetscTime(): 0.
#PETSc Option Table entries:
-al 1
-am 0
-at 0.001
-bcloc 0,1,0,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1,0,1,0,1,0,0,0,1,0,1,1,1,0,1,0.45,0.55,0.45,0.55,0.45,0.55
-bcnum 7
-bcval 0,0,0,0,0,0,1
-dim 3
-dm_refine 1
-dt 0.001
-edges 3,3
-floc 0.25,0.75,0.25,0.75,0.25,0.75
-fnum 0
-ftime 0,99
-fval 1
-ksp_atol 1.0e-8
-ksp_max_it 50000
-ksp_rtol 1.0e-8
-ksp_type cg
-log_summary
-lower 0,0
-mat_petscspace_order 0
-mesh datafiles/cube_with_hole5_mesh.dat
-mu 1
-nonneg 0
-numsteps 0
-options_left 0
-pc_type ilu
-petscpartitioner_type parmetis
-progress 1
-simplex 1
-solution_petscspace_order 1
-tao_fatol 1e-8
-tao_frtol 1e-8
-tao_max_it 50000
-tao_type blmvm
-trans datafiles/cube_with_hole5_trans.dat
-upper 1,1
-vtuname figures/cube_with_hole_5
-vtuprint 0
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --download-chaco --download-ctetgen --download-fblaslapack --download-hdf5 --download-hypre --download-metis --download-ml --download-parmetis --download-triangle --with-debugging=0 --with-mpi-dir=/usr/lib/openmpi --with-papi=/usr/local --with-shared-libraries=1 --with-valgrind=1 COPTFLAGS=-O2 CXXOPTFLAGS=-O2 FOPTFLAGS=-O2 PETSC_ARCH=arch-linux2-c-opt
-----------------------------------------
Libraries compiled on Tue Jan 26 00:25:01 2016 on pacotaco-xps
Machine characteristics: Linux-3.13.0-76-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /home/justin/Software/petsc-dev
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O2 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O2 ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/justin/Software/petsc-dev/arch-linux2-c-opt/include -I/home/justin/Software/petsc-dev/include -I/home/justin/Software/petsc-dev/include -I/home/justin/Software/petsc-dev/arch-linux2-c-opt/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/justin/Software/petsc-dev/arch-linux2-c-opt/lib -L/home/justin/Software/petsc-dev/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/justin/Software/petsc-dev/arch-linux2-c-opt/lib -L/home/justin/Software/petsc-dev/arch-linux2-c-opt/lib -lml -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lHYPRE -lmpi_cxx -lstdc++ -lflapack -lfblas -lparmetis -lmetis -ltriangle -lX11 -lssl -lcrypto -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lctetgen -lchaco -lm -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl
-----------------------------------------
More information about the petsc-users
mailing list