[petsc-users] Poor weak scaling when solving successive linear systems

Michael Becker Michael.Becker at physik.uni-giessen.de
Thu May 24 00:24:18 CDT 2018


I added a PETSc solver class to our particle-in-cell simulation code and 
all calculations seem to be correct. However, some weak scaling tests I 
did are rather disappointing because the solver's runtime keeps 
increasing with system size although the number of cores are scaled up 
accordingly. As a result, the solver's share of the total runtime 
becomes more and more dominant and the system sizes we aim for are 

It's a simple 3D Poisson problem on a structured grid with Dirichlet 
boundaries inside the domain, for which I found the cg/gamg combo to 
work the fastest. Since KSPsolve() is called during every timestep of 
the simulation to solve the same system with a new rhs vector, 
assembling the matrix and other PETSc objects should further not be a 
determining factor.

What puzzles me is that the convergence rate is actually good (the 
residual decreases by an order of magnitude for every KSP iteration) and 
the number of KSP iterations remains constant over the course of a 
simulation and is equal for all tested systems.

I even increased the (fixed) system size per processor to 30^3 unknowns 
(which is significantly more than the recommended 10,000), but runtime 
is still not even close to being constant.

This leads me to the conclusion that either I configured PETSc wrong, I 
don't call the correct PETSc-related functions, or something goes 
terribly wrong with communication.

Could you have a look at the attached log_view files and tell me if 
something is particularly odd? The system size per processor is 30^3 and 
the simulation ran over 1000 timesteps, which means KSPsolve() was 
called equally often. I introduced two new logging states - one for the 
first solve and the final setup and one for the remaining solves.

The repeatedly called code segment is

    PetscScalar *b_array;
    VecGetArray(b, &b_array);
    VecRestoreArray(b, &barray);


    PetscScalar *x_array;
    VecGetArray(x, &x_array);
    for (int i = 0; i < N_local; i++)
       x_array[i] = x_array_prev[i];
    VecRestoreArray(x, &x_array);


    for (int i = 0; i < N_local; i++)
       x_array_prev[i] = x_array[i];


I noticed that for every individual KSP iteration, six vector objects 
are created and destroyed (with CG, more with e.g. GMRES). This seems 
kind of wasteful, is this supposed to be like this? Is this even the 
reason for my problems? Apart from that, everything seems quite normal 
to me (but I'm not the expert here).

Thanks in advance.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180524/8a6eb9bf/attachment-0001.html>
-------------- next part --------------
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/home/ritsat/beckerm/ppp_test/plasmapic on a arch-linux-amd-opt named node2-007 with 125 processors, by beckerm Wed May 23 15:15:54 2018
Using Petsc Release Version 3.9.1, unknown 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.567e+02      1.00000   2.567e+02
Objects:              2.438e+04      1.00004   2.438e+04
Flop:                 2.125e+10      1.27708   1.963e+10  2.454e+12
Flop/sec:            8.278e+07      1.27708   7.648e+07  9.560e+09
MPI Messages:         1.042e+06      3.36140   7.129e+05  8.911e+07
MPI Message Lengths:  1.344e+09      2.32209   1.439e+03  1.282e+11
MPI Reductions:       2.250e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 6.9829e+00   2.7%  0.0000e+00   0.0%  3.000e+03   0.0%  3.178e+03        0.0%  1.700e+01   0.1% 
 1:     First Solve: 2.7562e+00   1.1%  3.6885e+09   0.2%  3.549e+05   0.4%  3.736e+03        1.0%  5.500e+02   2.4% 
 2: Remaining Solves: 2.4695e+02  96.2%  2.4504e+12  99.8%  8.875e+07  99.6%  1.430e+03       99.0%  2.192e+04  97.4% 

See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

--- Event Stage 0: Main Stage

VecSet                 3 1.0 5.6386e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided         12 1.0 1.7128e-02 2.1 0.00e+00 0.0 8.8e+03 4.0e+00 0.0e+00  0  0  0  0  0   1  0  2  0  0     0
BuildTwoSidedF        30 1.0 3.3218e-01 3.8 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00  0  0  0  0  0   7  0  2  5  0     0
KSPSetUp               9 1.0 3.9077e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
KSPSolve               1 1.0 2.7586e+00 1.0 3.26e+07 1.4 3.5e+05 3.7e+03 5.5e+02  1  0  0  1  2 100100100100100  1337
VecTDot                8 1.0 1.9397e-02 3.6 4.32e+05 1.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  1  0  0  1  2784
VecNorm                6 1.0 6.3949e-03 1.6 3.24e+05 1.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  1  0  0  1  6333
VecScale              24 1.0 1.2732e-04 2.1 5.43e+04 2.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 40434
VecCopy                1 1.0 1.5807e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               115 1.0 8.6141e-04 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                8 1.0 1.4498e-03 2.4 4.32e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 37246
VecAYPX               28 1.0 1.3914e-03 2.2 3.58e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 31519
VecAssemblyBegin       2 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         2 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      103 1.0 6.5608e-03 3.1 0.00e+00 0.0 8.9e+04 1.4e+03 0.0e+00  0  0  0  0  0   0  0 25  9  0     0
VecScatterEnd        103 1.0 6.5023e-02 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatMult               29 1.0 4.5694e-02 1.8 6.14e+06 1.2 3.0e+04 2.1e+03 0.0e+00  0  0  0  0  0   1 19  8  5  0 15687
MatMultAdd            24 1.0 2.1485e-02 3.6 1.37e+06 1.6 1.6e+04 6.5e+02 0.0e+00  0  0  0  0  0   1  4  5  1  0  7032
MatMultTranspose      24 1.0 1.6713e-02 2.6 1.37e+06 1.6 1.6e+04 6.5e+02 0.0e+00  0  0  0  0  0   0  4  5  1  0  9040
MatSolve               4 0.0 2.2173e-05 0.0 2.64e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    12
MatSOR                48 1.0 8.0235e-02 1.8 1.09e+07 1.3 2.7e+04 1.5e+03 8.0e+00  0  0  0  0  0   3 34  8  3  1 15626
MatLUFactorSym         1 1.0 5.4121e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 1.4782e-05 5.2 1.29e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     9
MatResidual           24 1.0 3.5083e-02 2.0 4.55e+06 1.3 2.7e+04 1.5e+03 0.0e+00  0  0  0  0  0   1 14  8  3  0 14794
MatAssemblyBegin      94 1.0 3.3449e-01 3.5 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00  0  0  0  0  0   7  0  2  5  0     0
MatAssemblyEnd        94 1.0 1.4880e-01 1.1 0.00e+00 0.0 6.3e+04 2.1e+02 2.3e+02  0  0  0  0  1   5  0 18  1 42     0
MatGetRow        3102093 1.3 4.6411e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  15  0  0  0  0     0
MatGetRowIJ            1 0.0 7.8678e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       6 1.0 4.4495e-01 2.1 0.00e+00 0.0 5.5e+04 1.7e+04 1.2e+01  0  0  0  1  0  12  0 15 71  2     0
MatCreateSubMat        4 1.0 3.7476e-02 1.0 0.00e+00 0.0 2.9e+03 2.7e+02 6.4e+01  0  0  0  0  0   1  0  1  0 12     0
MatGetOrdering         1 0.0 1.3685e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       6 1.0 6.0548e-02 1.2 0.00e+00 0.0 2.7e+04 1.0e+03 1.2e+01  0  0  0  0  0   2  0  8  2  2     0
MatCoarsen             6 1.0 3.5690e-02 1.1 0.00e+00 0.0 5.3e+04 5.8e+02 3.3e+01  0  0  0  0  0   1  0 15  2  6     0
MatZeroEntries         6 1.0 3.4430e-03 7.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                6 1.0 2.8406e-01 1.0 1.13e+07 1.6 6.3e+04 2.6e+03 9.2e+01  0  0  0  0  0  10 33 18 13 17  4307
MatPtAPSymbolic        6 1.0 1.6401e-01 1.0 0.00e+00 0.0 3.4e+04 2.7e+03 4.2e+01  0  0  0  0  0   6  0 10  7  8     0
MatPtAPNumeric         6 1.0 1.2070e-01 1.0 1.13e+07 1.6 2.9e+04 2.6e+03 4.8e+01  0  0  0  0  0   4 33  8  6  9 10136
MatGetLocalMat         6 1.0 4.4053e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          6 1.0 1.0330e-02 1.9 0.00e+00 0.0 2.0e+04 3.5e+03 0.0e+00  0  0  0  0  0   0  0  6  5  0     0
SFSetGraph            12 1.0 1.5497e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               12 1.0 2.5882e-02 1.5 0.00e+00 0.0 2.6e+04 6.2e+02 0.0e+00  0  0  0  0  0   1  0  7  1  0     0
SFBcastBegin          45 1.0 2.1088e-03 2.5 0.00e+00 0.0 5.4e+04 6.9e+02 0.0e+00  0  0  0  0  0   0  0 15  3  0     0
SFBcastEnd            45 1.0 2.0310e-02 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
GAMG: createProl       6 1.0 2.2022e+00 1.0 0.00e+00 0.0 2.0e+05 5.2e+03 2.8e+02  1  0  0  1  1  80  0 56 78 52     0
GAMG: partLevel        6 1.0 3.2547e-01 1.0 1.13e+07 1.6 6.6e+04 2.5e+03 1.9e+02  0  0  0  0  1  12 33 19 13 35  3759
  repartition          2 1.0 1.2660e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Invert-Sort          2 1.0 9.5701e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  1     0
  Move A               2 1.0 2.2409e-02 1.0 0.00e+00 0.0 1.4e+03 5.3e+02 3.4e+01  0  0  0  0  0   1  0  0  0  6     0
  Move P               2 1.0 1.6271e-02 1.0 0.00e+00 0.0 1.4e+03 1.3e+01 3.4e+01  0  0  0  0  0   1  0  0  0  6     0
PCSetUp                2 1.0 2.5381e+00 1.0 1.13e+07 1.6 2.7e+05 4.5e+03 5.1e+02  1  0  0  1  2  92 33 75 90 93   482
PCSetUpOnBlocks        4 1.0 3.4523e-04 1.9 1.29e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply                4 1.0 1.2703e-01 1.1 1.82e+07 1.3 8.6e+04 1.2e+03 8.0e+00  0  0  0  0  0   4 56 24  8  1 16335

--- Event Stage 2: Remaining Solves

KSPSolve             999 1.0 1.2762e+02 1.0 2.12e+10 1.3 8.8e+07 1.4e+03 2.2e+04 48100 99 97 97  50100 99 98100 19200
VecTDot             7968 1.0 1.0869e+01 6.1 4.30e+08 1.0 0.0e+00 0.0e+00 8.0e+03  2  2  0  0 35   2  2  0  0 36  4948
VecNorm             5982 1.0 4.5561e+00 3.6 3.23e+08 1.0 0.0e+00 0.0e+00 6.0e+03  1  2  0  0 27   1  2  0  0 27  8863
VecScale           23904 1.0 1.1319e-01 2.2 5.40e+07 2.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 45298
VecCopy              999 1.0 1.6182e-01 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             83664 1.0 8.0856e-01 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             7968 1.0 1.3577e+00 2.3 4.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 39613
VecAYPX            27888 1.0 1.3048e+00 2.2 3.56e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 33468
VecScatterBegin   100599 1.0 6.5181e+00 3.4 0.00e+00 0.0 8.8e+07 1.4e+03 0.0e+00  2  0 99 97  0   2  0 99 98  0     0
VecScatterEnd     100599 1.0 5.5370e+01 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
MatMult            28887 1.0 4.2860e+01 1.8 6.12e+09 1.2 3.0e+07 2.1e+03 0.0e+00 11 29 33 49  0  12 29 33 49  0 16661
MatMultAdd         23904 1.0 1.4803e+01 2.6 1.37e+09 1.6 1.6e+07 6.5e+02 0.0e+00  4  6 18  8  0   4  6 18  8  0 10166
MatMultTranspose   23904 1.0 1.5364e+01 2.4 1.37e+09 1.6 1.6e+07 6.5e+02 0.0e+00  4  6 18  8  0   4  6 18  8  0  9795
MatSolve            3984 0.0 1.9884e-02 0.0 2.63e+05 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    13
MatSOR             47808 1.0 6.8888e+01 1.7 1.08e+10 1.3 2.7e+07 1.5e+03 8.0e+03 25 51 30 32 35  26 51 30 32 36 18054
MatResidual        23904 1.0 3.1872e+01 1.9 4.54e+09 1.3 2.7e+07 1.5e+03 0.0e+00  8 21 30 32  0   8 21 30 32  0 16219
PCSetUpOnBlocks     3984 1.0 4.9551e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3984 1.0 1.0819e+02 1.1 1.81e+10 1.3 8.5e+07 1.2e+03 8.0e+03 42 84 96 80 35  43 84 96 81 36 19056

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              9        11424     0.
     DMKSP interface     1              0            0     0.
              Vector     5             52      2371496     0.
              Matrix     0             72     14138216     0.
    Distributed Mesh     1              0            0     0.
           Index Set     2             12       133768     0.
   IS L to G Mapping     1              0            0     0.
   Star Forest Graph     2              0            0     0.
     Discrete System     1              0            0     0.
         Vec Scatter     1             13        16016     0.
      Preconditioner     1              9         9676     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     8              0            0     0.
              Vector   140             92      2204792     0.
              Matrix   140             68     21738552     0.
      Matrix Coarsen     6              6         3816     0.
           Index Set   110            100       543240     0.
   Star Forest Graph    12             12        10368     0.
         Vec Scatter    31             18        22176     0.
      Preconditioner     8              0            0     0.

--- Event Stage 2: Remaining Solves

              Vector 23904          23904   1295501184     0.
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 2.26021e-05
Average time for zero size MPI_Send(): 1.52473e-05
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_type cg
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-has-attribute-aligned=1 PETSC_ARCH=arch-linux-amd-opt --download-f2cblaslapack --with-mpi-dir=/cm/shared/apps/mvapich2/intel-17.0.1/2.0 --download-hypre --download-ml --with-fc=0 --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 --with-batch --with-x --known-mpi-shared-libraries=1 --known-64-bit-blas-indices=4
Libraries compiled on 2018-05-03 16:11:18 on node52-021 
Machine characteristics: Linux-2.6.32-696.18.7.el6.x86_64-x86_64-with-redhat-6.6-Carbon
Using PETSc directory: /home/ritsat/beckerm/petsc
Using PETSc arch: arch-linux-amd-opt

Using C compiler: /cm/shared/apps/mvapich2/intel-17.0.1/2.0/bin/mpicc  -fPIC  -wd1572 -O3  

Using include paths: -I/home/ritsat/beckerm/petsc/include -I/home/ritsat/beckerm/petsc/arch-linux-amd-opt/include -I/cm/shared/apps/mvapich2/intel-17.0.1/2.0/include

Using C linker: /cm/shared/apps/mvapich2/intel-17.0.1/2.0/bin/mpicc
Using libraries: -Wl,-rpath,/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -L/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -lpetsc -Wl,-rpath,/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -L/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -lHYPRE -lml -lf2clapack -lf2cblas -lX11 -ldl
-------------- next part --------------
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/home/ritsat/beckerm/ppp_test/plasmapic on a arch-linux-amd-opt named node1-017 with 1000 processors, by beckerm Wed May 23 23:30:46 2018
Using Petsc Release Version 3.9.1, unknown 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.915e+02      1.00000   2.915e+02
Objects:              2.127e+04      1.00005   2.127e+04
Flop:                 1.922e+10      1.26227   1.851e+10  1.851e+13
Flop/sec:            6.595e+07      1.26227   6.349e+07  6.349e+10
MPI Messages:         1.075e+06      3.98874   7.375e+05  7.375e+08
MPI Message Lengths:  1.175e+09      2.32017   1.403e+03  1.034e+12
MPI Reductions:       1.199e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.5171e+01   8.6%  0.0000e+00   0.0%  2.700e+04   0.0%  3.178e+03        0.0%  1.700e+01   0.1% 
 1:     First Solve: 3.3123e+00   1.1%  3.1911e+10   0.2%  3.675e+06   0.5%  3.508e+03        1.2%  6.090e+02   5.1% 
 2: Remaining Solves: 2.6301e+02  90.2%  1.8475e+13  99.8%  7.338e+08  99.5%  1.392e+03       98.7%  1.135e+04  94.7% 

See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

--- Event Stage 0: Main Stage

VecSet                 3 1.0 4.4584e-04 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided         12 1.0 2.2965e-02 1.3 0.00e+00 0.0 8.9e+04 4.0e+00 0.0e+00  0  0  0  0  0   1  0  2  0  0     0
BuildTwoSidedF        30 1.0 4.6278e-01 3.1 0.00e+00 0.0 6.5e+04 1.0e+04 0.0e+00  0  0  0  0  0   9  0  2  5  0     0
KSPSetUp               9 1.0 2.1761e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  0   1  0  0  0  2     0
KSPSolve               1 1.0 3.3169e+00 1.0 3.35e+07 1.4 3.7e+06 3.5e+03 6.1e+02  1  0  0  1  5 100100100100100  9621
VecDotNorm2            4 1.0 4.2942e-03 3.1 4.32e+05 1.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  1  0  0  1 100602
VecMDot                3 1.0 3.0557e-0222.9 3.24e+05 1.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   1  1  0  0  0 10603
VecNorm                6 1.0 2.8319e-03 2.3 3.24e+05 1.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  1  0  0  1 114409
VecScale              32 1.0 6.2370e-04 1.4 2.70e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 422668
VecSet               124 1.0 5.1708e-03 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                8 1.0 1.8489e-03 2.6 4.32e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 233648
VecAYPX               25 1.0 3.2370e-03 6.2 1.96e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 59376
VecMAXPY               6 1.0 1.8075e-03 1.9 6.48e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  2  0  0  0 358516
VecAssemblyBegin       3 1.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         3 1.0 1.0014e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      108 1.0 1.7932e-02 6.6 0.00e+00 0.0 8.4e+05 1.4e+03 0.0e+00  0  0  0  0  0   0  0 23  9  0     0
VecScatterEnd        108 1.0 8.1686e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0
MatMult               29 1.0 6.4687e-02 2.6 6.14e+06 1.2 2.8e+05 2.0e+03 0.0e+00  0  0  0  0  0   1 19  8  4  0 91734
MatMultAdd            24 1.0 5.0761e-02 6.0 1.37e+06 1.6 1.5e+05 6.5e+02 0.0e+00  0  0  0  0  0   1  4  4  1  0 25391
MatMultTranspose      24 1.0 2.7226e-02 4.8 1.37e+06 1.6 1.5e+05 6.5e+02 0.0e+00  0  0  0  0  0   0  4  4  1  0 47340
MatSolve               4 0.0 4.7922e-05 0.0 1.10e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   229
MatSOR                48 1.0 9.3626e-02 1.9 1.09e+07 1.3 2.6e+05 1.5e+03 0.0e+00  0  0  0  0  0   2 33  7  3  0 111703
MatLUFactorSym         1 1.0 9.6083e-05 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 7.1049e-0537.2 3.29e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   463
MatResidual           24 1.0 4.1369e-02 2.3 4.55e+06 1.3 2.6e+05 1.5e+03 0.0e+00  0  0  0  0  0   1 14  7  3  0 105139
MatAssemblyBegin     102 1.0 4.6537e-01 2.9 0.00e+00 0.0 6.5e+04 1.0e+04 0.0e+00  0  0  0  0  0   9  0  2  5  0     0
MatAssemblyEnd       102 1.0 1.4218e-01 1.1 0.00e+00 0.0 6.2e+05 2.0e+02 2.5e+02  0  0  0  0  2   4  0 17  1 41     0
MatGetRow        3102093 1.3 5.4764e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  13  0  0  0  0     0
MatGetRowIJ            1 0.0 1.5974e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       6 1.0 4.6659e-01 2.1 0.00e+00 0.0 5.7e+05 1.6e+04 1.2e+01  0  0  0  1  0  10  0 15 72  2     0
MatCreateSubMat        6 1.0 2.8245e-02 1.0 0.00e+00 0.0 2.2e+04 3.3e+02 9.4e+01  0  0  0  0  1   1  0  1  0 15     0
MatGetOrdering         1 0.0 1.4687e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       6 1.0 1.1661e-01 1.1 0.00e+00 0.0 2.6e+05 9.9e+02 1.2e+01  0  0  0  0  0   3  0  7  2  2     0
MatCoarsen             6 1.0 5.6789e-02 1.0 0.00e+00 0.0 7.1e+05 4.4e+02 5.6e+01  0  0  0  0  0   2  0 19  2  9     0
MatZeroEntries         6 1.0 3.5298e-03 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                6 1.0 3.7699e-01 1.0 1.11e+07 1.6 6.3e+05 2.5e+03 9.2e+01  0  0  0  0  1  11 33 17 12 15 27514
MatPtAPSymbolic        6 1.0 2.2081e-01 1.0 0.00e+00 0.0 3.2e+05 2.7e+03 4.2e+01  0  0  0  0  0   7  0  9  7  7     0
MatPtAPNumeric         6 1.0 1.5378e-01 1.0 1.11e+07 1.6 3.0e+05 2.3e+03 4.8e+01  0  0  0  0  0   5 33  8  6  8 67450
MatGetLocalMat         6 1.0 4.8461e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          6 1.0 1.6591e-02 2.4 0.00e+00 0.0 1.9e+05 3.4e+03 0.0e+00  0  0  0  0  0   0  0  5  5  0     0
SFSetGraph            12 1.0 4.1962e-05 8.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               12 1.0 3.2055e-02 1.2 0.00e+00 0.0 2.7e+05 5.8e+02 0.0e+00  0  0  0  0  0   1  0  7  1  0     0
SFBcastBegin          68 1.0 2.7685e-03 2.8 0.00e+00 0.0 7.2e+05 5.1e+02 0.0e+00  0  0  0  0  0   0  0 20  3  0     0
SFBcastEnd            68 1.0 3.0165e-02 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
GAMG: createProl       6 1.0 2.5855e+00 1.0 0.00e+00 0.0 2.2e+06 4.7e+03 3.1e+02  1  0  0  1  3  78  0 59 79 51     0
GAMG: partLevel        6 1.0 4.1722e-01 1.0 1.11e+07 1.6 6.5e+05 2.4e+03 2.4e+02  0  0  0  0  2  13 33 18 12 40 24861
  repartition          3 1.0 3.8280e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
  Invert-Sort          3 1.0 3.2971e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Move A               3 1.0 1.6580e-02 1.1 0.00e+00 0.0 9.5e+03 7.4e+02 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
  Move P               3 1.0 1.4499e-02 1.1 0.00e+00 0.0 1.3e+04 1.3e+01 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
PCSetUp                2 1.0 3.0173e+00 1.0 1.11e+07 1.6 2.8e+06 4.2e+03 5.8e+02  1  0  0  1  5  91 33 77 91 96  3438
PCSetUpOnBlocks        4 1.0 4.0102e-04 2.7 3.29e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    82
PCApply                4 1.0 1.6476e-01 1.3 1.82e+07 1.3 8.2e+05 1.2e+03 0.0e+00  0  0  0  0  0   4 54 22  7  0 105522

--- Event Stage 2: Remaining Solves

KSPSolve             999 1.0 1.3831e+02 1.1 1.92e+10 1.3 7.3e+08 1.4e+03 1.1e+04 46100 99 97 95  51100 99 98100 133578
VecDotNorm2         3450 1.0 5.0804e+00 2.2 3.73e+08 1.0 0.0e+00 0.0e+00 3.4e+03  1  2  0  0 29   1  2  0  0 30 73340
VecMDot             2451 1.0 9.5447e+00 3.6 2.35e+08 1.0 0.0e+00 0.0e+00 2.5e+03  2  1  0  0 20   2  1  0  0 22 24644
VecNorm             5448 1.0 8.6350e+00 3.0 2.94e+08 1.0 0.0e+00 0.0e+00 5.4e+03  2  2  0  0 45   2  2  0  0 48 34070
VecScale           27600 1.0 5.1987e-01 1.4 2.33e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 437362
VecSet             72450 1.0 8.8635e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             6900 1.0 8.0184e-01 1.4 3.73e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 464680
VecAYPX            21699 1.0 1.0895e+00 2.3 1.72e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 155541
VecMAXPY            4902 1.0 1.0245e+00 1.4 4.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0 459209
VecScatterBegin    87249 1.0 6.4859e+00 2.9 0.00e+00 0.0 7.3e+08 1.4e+03 0.0e+00  2  0 99 97  0   2  0 99 98  0     0
VecScatterEnd      87249 1.0 5.9416e+01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  13  0  0  0  0     0
MatMult            25149 1.0 3.5269e+01 1.6 5.34e+09 1.2 2.5e+08 2.0e+03 0.0e+00  9 28 33 48  0  10 28 34 49  0 146467
MatMultAdd         20700 1.0 2.7336e+01 3.9 1.19e+09 1.6 1.3e+08 6.5e+02 0.0e+00  7  6 18  8  0   8  6 18  8  0 40666
MatMultTranspose   20700 1.0 1.6038e+01 3.0 1.19e+09 1.6 1.3e+08 6.5e+02 0.0e+00  3  6 18  8  0   3  6 18  8  0 69313
MatSolve            3450 0.0 3.8933e-02 0.0 9.47e+06 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   243
MatSOR             41400 1.0 6.3412e+01 1.6 9.37e+09 1.3 2.2e+08 1.5e+03 0.0e+00 20 49 30 32  0  22 49 30 32  0 141687
MatResidual        20700 1.0 2.5046e+01 1.6 3.93e+09 1.3 2.2e+08 1.5e+03 0.0e+00  6 20 30 32  0   7 20 30 32  0 149782
PCSetUpOnBlocks     3450 1.0 5.4419e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3450 1.0 1.1329e+02 1.1 1.57e+10 1.3 7.0e+08 1.2e+03 0.0e+00 38 81 96 80  0  42 81 96 81  0 132043

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              9        11416     0.
     DMKSP interface     1              0            0     0.
              Vector     5            110     15006256     0.
              Matrix     0             65     14780672     0.
    Distributed Mesh     1              0            0     0.
           Index Set     2             18       171852     0.
   IS L to G Mapping     1              0            0     0.
   Star Forest Graph     2              0            0     0.
     Discrete System     1              0            0     0.
         Vec Scatter     1             13        16016     0.
      Preconditioner     1              9         9676     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     8              0            0     0.
              Vector   210            104      2238504     0.
              Matrix   148             83     22951356     0.
      Matrix Coarsen     6              6         3816     0.
           Index Set   128            112       590828     0.
   Star Forest Graph    12             12        10368     0.
         Vec Scatter    34             21        25872     0.
      Preconditioner     8              0            0     0.

--- Event Stage 2: Remaining Solves

              Vector 20700          20700   1128260400     0.
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 3.46184e-05
Average time for zero size MPI_Send(): 1.66161e-05
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_type gcr
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-has-attribute-aligned=1 PETSC_ARCH=arch-linux-amd-opt --download-f2cblaslapack --with-mpi-dir=/cm/shared/apps/mvapich2/intel-17.0.1/2.0 --download-hypre --download-ml --with-fc=0 --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 --with-batch --with-x --known-mpi-shared-libraries=1 --known-64-bit-blas-indices=4
Libraries compiled on 2018-05-03 16:11:18 on node52-021 
Machine characteristics: Linux-2.6.32-696.18.7.el6.x86_64-x86_64-with-redhat-6.6-Carbon
Using PETSc directory: /home/ritsat/beckerm/petsc
Using PETSc arch: arch-linux-amd-opt

Using C compiler: /cm/shared/apps/mvapich2/intel-17.0.1/2.0/bin/mpicc  -fPIC  -wd1572 -O3  

Using include paths: -I/home/ritsat/beckerm/petsc/include -I/home/ritsat/beckerm/petsc/arch-linux-amd-opt/include -I/cm/shared/apps/mvapich2/intel-17.0.1/2.0/include

Using C linker: /cm/shared/apps/mvapich2/intel-17.0.1/2.0/bin/mpicc
Using libraries: -Wl,-rpath,/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -L/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -lpetsc -Wl,-rpath,/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -L/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -lHYPRE -lml -lf2clapack -lf2cblas -lX11 -ldl

More information about the petsc-users mailing list