[petsc-users] analysing petsc log information and standard execution time for Stokes like equations

Bishesh Khanal bisheshkh at gmail.com
Wed Oct 29 11:18:50 CDT 2014


Dear all,
The computer cluster I'm using to execute my Petsc based code (Stokes-like
equation solver) is providing me results with a big variation on the
execution times for almost the same problems.
When I look at the -log_summary, and the time taken and Mflop/s for Petsc
routines it looks like the issue is with cluster.
I'd like to confirm this before looking into cluster related issues.
I've provided below (after the questions) the relevant -log_summary outputs
for two problems (also a file attached if the email does not show up the
outputs nicely).

The tow problems P1 and P2 solves Stokes-like equation of the same size and
using same combination of KSP and PC using 64 processors (8 nodes with 8
proc/node).
P1 solves the equation 6 times while P2 only once.
The operator is slightly different due to slightly different boundary but
since the no. of iterations from ksp_monitor were almost the same, I guess
this is not an issue.

Now the problem is that P2 case execution was much slower than P1. In other
experiments too, the execution is quite fast sometimes but slow most of the
other times.

I can see in the log_summary that different Petsc routines are running much
slower for P2 and has smaller Mflop/s rate.

My two questions:
1. Do these outputs confirm that the issue is with the cluster and not with
my code ? If yes, what kinds of things I should focus/learn while
submitting jobs to the cluster ? Any pointer would be helpful.

2. If I set the equations for the constant viscosity and as:
div(grad(u)) - grad(p) = f
div(u) + kp = g
with k=1 in some regions and 0 in most of the other regions; with f and g
are functions spatially varying.
and solve the system with 64 to 128 processors using ksp and pc as:
-pc_fieldsplit_type schur -pc_fieldsplit_schur_precondition self
-pc_fieldsplit_dm_splits 0 -pc_fieldsplit_0_fields 0,1,2
-pc_fieldsplit_1_fields 3 -fieldsplit_0_pc_type hypre

What order of execution time for solving this system should I target to be
reasonable with around, say 128 processors ?

-log_summary output for P1
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

/epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a
arch-linux2-cxx-opt named nef017 with 64 processors, by bkhanal Tue Oct 28
05:52:24 2014
Using Petsc Release Version 3.4.3, Oct, 15, 2013

                         Max       Max/Min        Avg      Total
Time (sec):           4.221e+04      1.00780   4.201e+04
Objects:              9.980e+02      1.00000   9.980e+02
Flops:                2.159e+11      1.08499   2.106e+11  1.348e+13
Flops/sec:            5.154e+06      1.08499   5.013e+06  3.208e+08
MPI Messages:         1.316e+05      3.69736   7.413e+04  4.744e+06
MPI Message Lengths:  1.986e+09      2.61387   1.581e+04  7.502e+10
MPI Reductions:       8.128e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 4.2010e+04 100.0%  1.3477e+13 100.0%  4.744e+06
100.0%  1.581e+04      100.0%  8.127e+03 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecView                6 1.0 3.8704e+01424.2 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
VecMDot             3228 1.0 1.0614e+01 1.7 5.93e+09 1.1 0.0e+00 0.0e+00
3.2e+03  0  3  0  0 40   0  3  0  0 40 35184
VecNorm             4383 1.0 4.0579e+01 9.8 2.73e+09 1.1 0.0e+00 0.0e+00
4.4e+03  0  1  0  0 54   0  1  0  0 54  4239
VecScale            4680 1.0 1.2393e+00 1.3 1.39e+09 1.1 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0 70518
VecCopy             1494 1.0 1.1592e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              5477 1.0 9.7614e+02288.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             1113 1.0 7.9877e-01 2.3 6.17e+08 1.1 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 48645
VecAYPX              375 1.0 7.9671e-02 1.4 4.99e+07 1.1 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 39491
VecMAXPY            4317 1.0 7.3185e+00 2.1 8.89e+09 1.1 0.0e+00 0.0e+00
0.0e+00  0  4  0  0  0   0  4  0  0  0 76569
VecScatterBegin     5133 1.0 6.7937e+00 2.4 0.00e+00 0.0 4.7e+06 1.5e+04
1.2e+01  0  0100 96  0   0  0100 96  0     0
VecScatterEnd       5121 1.0 2.5840e+02113.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize        3984 1.0 7.2473e+00 1.6 3.92e+09 1.1 0.0e+00 0.0e+00
4.0e+03  0  2  0  0 49   0  2  0  0 49 34118
MatMult              910 1.0 7.5977e+03 1.0 2.11e+11 1.1 4.7e+06 1.5e+04
6.2e+03 18 98 99 92 76  18 98 99 92 76  1736
MatMultAdd           702 1.0 5.5198e+01 6.3 4.98e+09 1.1 6.6e+05 6.2e+03
0.0e+00  0  2 14  5  0   0  2 14  5  0  5624
MatConvert             6 1.0 3.3739e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatAssemblyBegin      55 1.0 5.7320e+0319.7 0.00e+00 0.0 0.0e+00 0.0e+00
6.2e+01 11  0  0  0  1  11  0  0  0  1     0
MatAssemblyEnd        55 1.0 1.6179e+00 1.3 0.00e+00 0.0 9.4e+03 3.6e+03
4.0e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ           12 1.0 1.3590e-05 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        20 1.0 6.9117e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               18 1.0 4.7298e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
3.6e+01  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog      3228 1.0 1.4727e+01 1.7 1.19e+10 1.1 0.0e+00 0.0e+00
3.2e+03  0  6  0  0 40   0  6  0  0 40 50718
KSPSetUp              18 1.0 5.4130e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               6 1.0 3.2363e+04 1.0 2.15e+11 1.1 4.7e+06 1.5e+04
7.7e+03 77100 99 92 95  77100 99 92 95   415
PCSetUp               18 1.0 2.5724e+04 1.0 0.00e+00 0.0 7.7e+03 3.3e+04
2.0e+02 61  0  0  0  2  61  0  0  0  2     0
PCApply               18 1.0 3.2248e+04 1.0 2.12e+11 1.1 4.7e+06 1.5e+04
7.6e+03 77 98 99 91 93  77 98 99 91 93   411
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   859            849   1446200816     0
      Vector Scatter    26             16        16352     0
              Matrix    20             20   1832984916     0
    Distributed Mesh     3              3      5457592     0
     Bipartite Graph     6              6         4848     0
           Index Set    61             61      1882648     0
   IS L to G Mapping     5              5      3961468     0
       Krylov Solver     5              5        57888     0
     DMKSP interface     1              1          656     0
      Preconditioner     5              5         4440     0
              Viewer     7              6         4272     0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 5.20423e-06
#PETSc Option Table entries:




-log_summary output for P2:
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

/epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a
arch-linux2-cxx-opt named nef001 with 64 processors, by bkhanal Wed Oct 29
14:24:36 2014
Using Petsc Release Version 3.4.3, Oct, 15, 2013

                         Max       Max/Min        Avg      Total
Time (sec):           1.958e+04      1.00194   1.955e+04
Objects:              3.190e+02      1.00000   3.190e+02
Flops:                3.638e+10      1.08499   3.548e+10  2.271e+12
Flops/sec:            1.861e+06      1.08676   1.815e+06  1.161e+08
MPI Messages:         2.253e+04      3.68455   1.270e+04  8.131e+05
MPI Message Lengths:  3.403e+08      2.51345   1.616e+04  1.314e+10
MPI Reductions:       1.544e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 1.9554e+04 100.0%  2.2709e+12 100.0%  8.131e+05
100.0%  1.616e+04      100.0%  1.543e+03  99.9%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecView                1 1.0 4.4869e+02189.5 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecMDot              544 1.0 1.8271e+01 2.2 1.00e+09 1.1 0.0e+00 0.0e+00
5.4e+02  0  3  0  0 35   0  3  0  0 35  3456
VecNorm              738 1.0 2.0433e+0218.1 4.60e+08 1.1 0.0e+00 0.0e+00
7.4e+02  1  1  0  0 48   1  1  0  0 48   142
VecScale             788 1.0 4.1195e+00 9.0 2.34e+08 1.1 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0  3573
VecCopy              251 1.0 7.6140e+0046.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               926 1.0 3.9087e+0141.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              187 1.0 6.0848e+0032.3 1.04e+08 1.1 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  1073
VecAYPX               63 1.0 4.6702e-0116.2 8.38e+06 1.1 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  1131
VecMAXPY             727 1.0 1.0997e+01 4.9 1.50e+09 1.1 0.0e+00 0.0e+00
0.0e+00  0  4  0  0  0   0  4  0  0  0  8610
VecScatterBegin      864 1.0 2.0978e+0234.1 0.00e+00 0.0 8.0e+05 1.5e+04
2.0e+00  0  0 98 92  0   0  0 98 92  0     0
VecScatterEnd        862 1.0 5.4781e+02114.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecNormalize         671 1.0 1.6922e+01 2.2 6.61e+08 1.1 0.0e+00 0.0e+00
6.7e+02  0  2  0  0 43   0  2  0  0 43  2461
MatMult              152 1.0 6.3271e+03 1.0 3.56e+10 1.1 7.9e+05 1.5e+04
1.0e+03 32 98 98 89 68  32 98 98 89 68   351
MatMultAdd           118 1.0 4.5234e+02183.7 8.36e+08 1.1 1.1e+05 6.2e+03
0.0e+00  1  2 14  5  0   1  2 14  5  0   115
MatConvert             1 1.0 3.6065e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatAssemblyBegin      10 1.0 1.0849e+03 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+01  5  0  0  0  1   5  0  0  0  1     0
MatAssemblyEnd        10 1.0 1.3957e+01 1.1 0.00e+00 0.0 9.4e+03 3.6e+03
4.0e+01  0  0  1  0  3   0  0  1  0  3     0
MatGetRowIJ            2 1.0 2.2221e-03582.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 1.0 3.7378e-01 9.8 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog       544 1.0 2.0370e+01 1.7 2.00e+09 1.1 0.0e+00 0.0e+00
5.4e+02  0  6  0  0 35   0  6  0  0 35  6200
KSPSetUp               3 1.0 4.2598e+01 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve               1 1.0 1.5113e+04 1.0 3.63e+10 1.1 7.9e+05 1.5e+04
1.3e+03 77100 98 89 84  77100 98 89 84   150
PCSetUp                3 1.0 1.1794e+04 1.0 0.00e+00 0.0 7.7e+03 3.3e+04
1.3e+02 60  0  1  2  8  60  0  1  2  8     0
PCApply                3 1.0 1.4940e+04 1.0 3.58e+10 1.1 7.9e+05 1.5e+04
1.3e+03 76 98 97 88 83  76 98 97 88 83   149
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   215            215    743212688     0
      Vector Scatter    16             16        16352     0
              Matrix    20             20   1832984916     0
    Distributed Mesh     3              3      5457592     0
     Bipartite Graph     6              6         4848     0
           Index Set    41             41      1867368     0
   IS L to G Mapping     5              5      3961468     0
       Krylov Solver     5              5        57888     0
     DMKSP interface     1              1          656     0
      Preconditioner     5              5         4440     0
              Viewer     2              1          712     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000274992
Average time for zero size MPI_Send(): 1.67042e-05
#PETSc Option Table entries:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20141029/fb5041ed/attachment-0001.html>
-------------- next part --------------
P1:
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a arch-linux2-cxx-opt named nef017 with 64 processors, by bkhanal Tue Oct 28 05:52:24 2014
Using Petsc Release Version 3.4.3, Oct, 15, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           4.221e+04      1.00780   4.201e+04
Objects:              9.980e+02      1.00000   9.980e+02
Flops:                2.159e+11      1.08499   2.106e+11  1.348e+13
Flops/sec:            5.154e+06      1.08499   5.013e+06  3.208e+08
MPI Messages:         1.316e+05      3.69736   7.413e+04  4.744e+06
MPI Message Lengths:  1.986e+09      2.61387   1.581e+04  7.502e+10
MPI Reductions:       8.128e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.2010e+04 100.0%  1.3477e+13 100.0%  4.744e+06 100.0%  1.581e+04      100.0%  8.127e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecView                6 1.0 3.8704e+01424.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
VecMDot             3228 1.0 1.0614e+01 1.7 5.93e+09 1.1 0.0e+00 0.0e+00 3.2e+03  0  3  0  0 40   0  3  0  0 40 35184
VecNorm             4383 1.0 4.0579e+01 9.8 2.73e+09 1.1 0.0e+00 0.0e+00 4.4e+03  0  1  0  0 54   0  1  0  0 54  4239
VecScale            4680 1.0 1.2393e+00 1.3 1.39e+09 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 70518
VecCopy             1494 1.0 1.1592e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              5477 1.0 9.7614e+02288.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             1113 1.0 7.9877e-01 2.3 6.17e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 48645
VecAYPX              375 1.0 7.9671e-02 1.4 4.99e+07 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 39491
VecMAXPY            4317 1.0 7.3185e+00 2.1 8.89e+09 1.1 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0 76569
VecScatterBegin     5133 1.0 6.7937e+00 2.4 0.00e+00 0.0 4.7e+06 1.5e+04 1.2e+01  0  0100 96  0   0  0100 96  0     0
VecScatterEnd       5121 1.0 2.5840e+02113.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize        3984 1.0 7.2473e+00 1.6 3.92e+09 1.1 0.0e+00 0.0e+00 4.0e+03  0  2  0  0 49   0  2  0  0 49 34118
MatMult              910 1.0 7.5977e+03 1.0 2.11e+11 1.1 4.7e+06 1.5e+04 6.2e+03 18 98 99 92 76  18 98 99 92 76  1736
MatMultAdd           702 1.0 5.5198e+01 6.3 4.98e+09 1.1 6.6e+05 6.2e+03 0.0e+00  0  2 14  5  0   0  2 14  5  0  5624
MatConvert             6 1.0 3.3739e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatAssemblyBegin      55 1.0 5.7320e+0319.7 0.00e+00 0.0 0.0e+00 0.0e+00 6.2e+01 11  0  0  0  1  11  0  0  0  1     0
MatAssemblyEnd        55 1.0 1.6179e+00 1.3 0.00e+00 0.0 9.4e+03 3.6e+03 4.0e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ           12 1.0 1.3590e-05 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        20 1.0 6.9117e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               18 1.0 4.7298e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+01  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog      3228 1.0 1.4727e+01 1.7 1.19e+10 1.1 0.0e+00 0.0e+00 3.2e+03  0  6  0  0 40   0  6  0  0 40 50718
KSPSetUp              18 1.0 5.4130e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               6 1.0 3.2363e+04 1.0 2.15e+11 1.1 4.7e+06 1.5e+04 7.7e+03 77100 99 92 95  77100 99 92 95   415
PCSetUp               18 1.0 2.5724e+04 1.0 0.00e+00 0.0 7.7e+03 3.3e+04 2.0e+02 61  0  0  0  2  61  0  0  0  2     0
PCApply               18 1.0 3.2248e+04 1.0 2.12e+11 1.1 4.7e+06 1.5e+04 7.6e+03 77 98 99 91 93  77 98 99 91 93   411
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   859            849   1446200816     0
      Vector Scatter    26             16        16352     0
              Matrix    20             20   1832984916     0
    Distributed Mesh     3              3      5457592     0
     Bipartite Graph     6              6         4848     0
           Index Set    61             61      1882648     0
   IS L to G Mapping     5              5      3961468     0
       Krylov Solver     5              5        57888     0
     DMKSP interface     1              1          656     0
      Preconditioner     5              5         4440     0
              Viewer     7              6         4272     0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 5.20423e-06
#PETSc Option Table entries:
...




P2:
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a arch-linux2-cxx-opt named nef001 with 64 processors, by bkhanal Wed Oct 29 14:24:36 2014
Using Petsc Release Version 3.4.3, Oct, 15, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.958e+04      1.00194   1.955e+04
Objects:              3.190e+02      1.00000   3.190e+02
Flops:                3.638e+10      1.08499   3.548e+10  2.271e+12
Flops/sec:            1.861e+06      1.08676   1.815e+06  1.161e+08
MPI Messages:         2.253e+04      3.68455   1.270e+04  8.131e+05
MPI Message Lengths:  3.403e+08      2.51345   1.616e+04  1.314e+10
MPI Reductions:       1.544e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.9554e+04 100.0%  2.2709e+12 100.0%  8.131e+05 100.0%  1.616e+04      100.0%  1.543e+03  99.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecView                1 1.0 4.4869e+02189.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecMDot              544 1.0 1.8271e+01 2.2 1.00e+09 1.1 0.0e+00 0.0e+00 5.4e+02  0  3  0  0 35   0  3  0  0 35  3456
VecNorm              738 1.0 2.0433e+0218.1 4.60e+08 1.1 0.0e+00 0.0e+00 7.4e+02  1  1  0  0 48   1  1  0  0 48   142
VecScale             788 1.0 4.1195e+00 9.0 2.34e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  3573
VecCopy              251 1.0 7.6140e+0046.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               926 1.0 3.9087e+0141.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              187 1.0 6.0848e+0032.3 1.04e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1073
VecAYPX               63 1.0 4.6702e-0116.2 8.38e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1131
VecMAXPY             727 1.0 1.0997e+01 4.9 1.50e+09 1.1 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0  8610
VecScatterBegin      864 1.0 2.0978e+0234.1 0.00e+00 0.0 8.0e+05 1.5e+04 2.0e+00  0  0 98 92  0   0  0 98 92  0     0
VecScatterEnd        862 1.0 5.4781e+02114.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecNormalize         671 1.0 1.6922e+01 2.2 6.61e+08 1.1 0.0e+00 0.0e+00 6.7e+02  0  2  0  0 43   0  2  0  0 43  2461
MatMult              152 1.0 6.3271e+03 1.0 3.56e+10 1.1 7.9e+05 1.5e+04 1.0e+03 32 98 98 89 68  32 98 98 89 68   351
MatMultAdd           118 1.0 4.5234e+02183.7 8.36e+08 1.1 1.1e+05 6.2e+03 0.0e+00  1  2 14  5  0   1  2 14  5  0   115
MatConvert             1 1.0 3.6065e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatAssemblyBegin      10 1.0 1.0849e+03 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  5  0  0  0  1   5  0  0  0  1     0
MatAssemblyEnd        10 1.0 1.3957e+01 1.1 0.00e+00 0.0 9.4e+03 3.6e+03 4.0e+01  0  0  1  0  3   0  0  1  0  3     0
MatGetRowIJ            2 1.0 2.2221e-03582.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 1.0 3.7378e-01 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog       544 1.0 2.0370e+01 1.7 2.00e+09 1.1 0.0e+00 0.0e+00 5.4e+02  0  6  0  0 35   0  6  0  0 35  6200
KSPSetUp               3 1.0 4.2598e+01 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve               1 1.0 1.5113e+04 1.0 3.63e+10 1.1 7.9e+05 1.5e+04 1.3e+03 77100 98 89 84  77100 98 89 84   150
PCSetUp                3 1.0 1.1794e+04 1.0 0.00e+00 0.0 7.7e+03 3.3e+04 1.3e+02 60  0  1  2  8  60  0  1  2  8     0
PCApply                3 1.0 1.4940e+04 1.0 3.58e+10 1.1 7.9e+05 1.5e+04 1.3e+03 76 98 97 88 83  76 98 97 88 83   149
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   215            215    743212688     0
      Vector Scatter    16             16        16352     0
              Matrix    20             20   1832984916     0
    Distributed Mesh     3              3      5457592     0
     Bipartite Graph     6              6         4848     0
           Index Set    41             41      1867368     0
   IS L to G Mapping     5              5      3961468     0
       Krylov Solver     5              5        57888     0
     DMKSP interface     1              1          656     0
      Preconditioner     5              5         4440     0
              Viewer     2              1          712     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000274992
Average time for zero size MPI_Send(): 1.67042e-05
#PETSc Option Table entries:
...



More information about the petsc-users mailing list