[petsc-users] analysing petsc log information and standard execution time for Stokes like equations
Bishesh Khanal
bisheshkh at gmail.com
Wed Oct 29 11:18:50 CDT 2014
Dear all,
The computer cluster I'm using to execute my Petsc based code (Stokes-like
equation solver) is providing me results with a big variation on the
execution times for almost the same problems.
When I look at the -log_summary, and the time taken and Mflop/s for Petsc
routines it looks like the issue is with cluster.
I'd like to confirm this before looking into cluster related issues.
I've provided below (after the questions) the relevant -log_summary outputs
for two problems (also a file attached if the email does not show up the
outputs nicely).
The tow problems P1 and P2 solves Stokes-like equation of the same size and
using same combination of KSP and PC using 64 processors (8 nodes with 8
proc/node).
P1 solves the equation 6 times while P2 only once.
The operator is slightly different due to slightly different boundary but
since the no. of iterations from ksp_monitor were almost the same, I guess
this is not an issue.
Now the problem is that P2 case execution was much slower than P1. In other
experiments too, the execution is quite fast sometimes but slow most of the
other times.
I can see in the log_summary that different Petsc routines are running much
slower for P2 and has smaller Mflop/s rate.
My two questions:
1. Do these outputs confirm that the issue is with the cluster and not with
my code ? If yes, what kinds of things I should focus/learn while
submitting jobs to the cluster ? Any pointer would be helpful.
2. If I set the equations for the constant viscosity and as:
div(grad(u)) - grad(p) = f
div(u) + kp = g
with k=1 in some regions and 0 in most of the other regions; with f and g
are functions spatially varying.
and solve the system with 64 to 128 processors using ksp and pc as:
-pc_fieldsplit_type schur -pc_fieldsplit_schur_precondition self
-pc_fieldsplit_dm_splits 0 -pc_fieldsplit_0_fields 0,1,2
-pc_fieldsplit_1_fields 3 -fieldsplit_0_pc_type hypre
What order of execution time for solving this system should I target to be
reasonable with around, say 128 processors ?
-log_summary output for P1
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
/epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a
arch-linux2-cxx-opt named nef017 with 64 processors, by bkhanal Tue Oct 28
05:52:24 2014
Using Petsc Release Version 3.4.3, Oct, 15, 2013
Max Max/Min Avg Total
Time (sec): 4.221e+04 1.00780 4.201e+04
Objects: 9.980e+02 1.00000 9.980e+02
Flops: 2.159e+11 1.08499 2.106e+11 1.348e+13
Flops/sec: 5.154e+06 1.08499 5.013e+06 3.208e+08
MPI Messages: 1.316e+05 3.69736 7.413e+04 4.744e+06
MPI Message Lengths: 1.986e+09 2.61387 1.581e+04 7.502e+10
MPI Reductions: 8.128e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 4.2010e+04 100.0% 1.3477e+13 100.0% 4.744e+06
100.0% 1.581e+04 100.0% 8.127e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec)
Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecView 6 1.0 3.8704e+01424.2 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+01 0 0 0 0 0 0 0 0 0 0 0
VecMDot 3228 1.0 1.0614e+01 1.7 5.93e+09 1.1 0.0e+00 0.0e+00
3.2e+03 0 3 0 0 40 0 3 0 0 40 35184
VecNorm 4383 1.0 4.0579e+01 9.8 2.73e+09 1.1 0.0e+00 0.0e+00
4.4e+03 0 1 0 0 54 0 1 0 0 54 4239
VecScale 4680 1.0 1.2393e+00 1.3 1.39e+09 1.1 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 70518
VecCopy 1494 1.0 1.1592e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 5477 1.0 9.7614e+02288.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1113 1.0 7.9877e-01 2.3 6.17e+08 1.1 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 48645
VecAYPX 375 1.0 7.9671e-02 1.4 4.99e+07 1.1 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 39491
VecMAXPY 4317 1.0 7.3185e+00 2.1 8.89e+09 1.1 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 76569
VecScatterBegin 5133 1.0 6.7937e+00 2.4 0.00e+00 0.0 4.7e+06 1.5e+04
1.2e+01 0 0100 96 0 0 0100 96 0 0
VecScatterEnd 5121 1.0 2.5840e+02113.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 3984 1.0 7.2473e+00 1.6 3.92e+09 1.1 0.0e+00 0.0e+00
4.0e+03 0 2 0 0 49 0 2 0 0 49 34118
MatMult 910 1.0 7.5977e+03 1.0 2.11e+11 1.1 4.7e+06 1.5e+04
6.2e+03 18 98 99 92 76 18 98 99 92 76 1736
MatMultAdd 702 1.0 5.5198e+01 6.3 4.98e+09 1.1 6.6e+05 6.2e+03
0.0e+00 0 2 14 5 0 0 2 14 5 0 5624
MatConvert 6 1.0 3.3739e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatAssemblyBegin 55 1.0 5.7320e+0319.7 0.00e+00 0.0 0.0e+00 0.0e+00
6.2e+01 11 0 0 0 1 11 0 0 0 1 0
MatAssemblyEnd 55 1.0 1.6179e+00 1.3 0.00e+00 0.0 9.4e+03 3.6e+03
4.0e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 12 1.0 1.3590e-05 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 20 1.0 6.9117e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 18 1.0 4.7298e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
3.6e+01 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 3228 1.0 1.4727e+01 1.7 1.19e+10 1.1 0.0e+00 0.0e+00
3.2e+03 0 6 0 0 40 0 6 0 0 40 50718
KSPSetUp 18 1.0 5.4130e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+01 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 6 1.0 3.2363e+04 1.0 2.15e+11 1.1 4.7e+06 1.5e+04
7.7e+03 77100 99 92 95 77100 99 92 95 415
PCSetUp 18 1.0 2.5724e+04 1.0 0.00e+00 0.0 7.7e+03 3.3e+04
2.0e+02 61 0 0 0 2 61 0 0 0 2 0
PCApply 18 1.0 3.2248e+04 1.0 2.12e+11 1.1 4.7e+06 1.5e+04
7.6e+03 77 98 99 91 93 77 98 99 91 93 411
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 859 849 1446200816 0
Vector Scatter 26 16 16352 0
Matrix 20 20 1832984916 0
Distributed Mesh 3 3 5457592 0
Bipartite Graph 6 6 4848 0
Index Set 61 61 1882648 0
IS L to G Mapping 5 5 3961468 0
Krylov Solver 5 5 57888 0
DMKSP interface 1 1 656 0
Preconditioner 5 5 4440 0
Viewer 7 6 4272 0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 5.20423e-06
#PETSc Option Table entries:
-log_summary output for P2:
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
/epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a
arch-linux2-cxx-opt named nef001 with 64 processors, by bkhanal Wed Oct 29
14:24:36 2014
Using Petsc Release Version 3.4.3, Oct, 15, 2013
Max Max/Min Avg Total
Time (sec): 1.958e+04 1.00194 1.955e+04
Objects: 3.190e+02 1.00000 3.190e+02
Flops: 3.638e+10 1.08499 3.548e+10 2.271e+12
Flops/sec: 1.861e+06 1.08676 1.815e+06 1.161e+08
MPI Messages: 2.253e+04 3.68455 1.270e+04 8.131e+05
MPI Message Lengths: 3.403e+08 2.51345 1.616e+04 1.314e+10
MPI Reductions: 1.544e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 1.9554e+04 100.0% 2.2709e+12 100.0% 8.131e+05
100.0% 1.616e+04 100.0% 1.543e+03 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec)
Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecView 1 1.0 4.4869e+02189.5 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecMDot 544 1.0 1.8271e+01 2.2 1.00e+09 1.1 0.0e+00 0.0e+00
5.4e+02 0 3 0 0 35 0 3 0 0 35 3456
VecNorm 738 1.0 2.0433e+0218.1 4.60e+08 1.1 0.0e+00 0.0e+00
7.4e+02 1 1 0 0 48 1 1 0 0 48 142
VecScale 788 1.0 4.1195e+00 9.0 2.34e+08 1.1 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 3573
VecCopy 251 1.0 7.6140e+0046.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 926 1.0 3.9087e+0141.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 187 1.0 6.0848e+0032.3 1.04e+08 1.1 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1073
VecAYPX 63 1.0 4.6702e-0116.2 8.38e+06 1.1 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1131
VecMAXPY 727 1.0 1.0997e+01 4.9 1.50e+09 1.1 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 8610
VecScatterBegin 864 1.0 2.0978e+0234.1 0.00e+00 0.0 8.0e+05 1.5e+04
2.0e+00 0 0 98 92 0 0 0 98 92 0 0
VecScatterEnd 862 1.0 5.4781e+02114.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecNormalize 671 1.0 1.6922e+01 2.2 6.61e+08 1.1 0.0e+00 0.0e+00
6.7e+02 0 2 0 0 43 0 2 0 0 43 2461
MatMult 152 1.0 6.3271e+03 1.0 3.56e+10 1.1 7.9e+05 1.5e+04
1.0e+03 32 98 98 89 68 32 98 98 89 68 351
MatMultAdd 118 1.0 4.5234e+02183.7 8.36e+08 1.1 1.1e+05 6.2e+03
0.0e+00 1 2 14 5 0 1 2 14 5 0 115
MatConvert 1 1.0 3.6065e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatAssemblyBegin 10 1.0 1.0849e+03 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+01 5 0 0 0 1 5 0 0 0 1 0
MatAssemblyEnd 10 1.0 1.3957e+01 1.1 0.00e+00 0.0 9.4e+03 3.6e+03
4.0e+01 0 0 1 0 3 0 0 1 0 3 0
MatGetRowIJ 2 1.0 2.2221e-03582.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 1.0 3.7378e-01 9.8 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 544 1.0 2.0370e+01 1.7 2.00e+09 1.1 0.0e+00 0.0e+00
5.4e+02 0 6 0 0 35 0 6 0 0 35 6200
KSPSetUp 3 1.0 4.2598e+01 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 1 1.0 1.5113e+04 1.0 3.63e+10 1.1 7.9e+05 1.5e+04
1.3e+03 77100 98 89 84 77100 98 89 84 150
PCSetUp 3 1.0 1.1794e+04 1.0 0.00e+00 0.0 7.7e+03 3.3e+04
1.3e+02 60 0 1 2 8 60 0 1 2 8 0
PCApply 3 1.0 1.4940e+04 1.0 3.58e+10 1.1 7.9e+05 1.5e+04
1.3e+03 76 98 97 88 83 76 98 97 88 83 149
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 215 215 743212688 0
Vector Scatter 16 16 16352 0
Matrix 20 20 1832984916 0
Distributed Mesh 3 3 5457592 0
Bipartite Graph 6 6 4848 0
Index Set 41 41 1867368 0
IS L to G Mapping 5 5 3961468 0
Krylov Solver 5 5 57888 0
DMKSP interface 1 1 656 0
Preconditioner 5 5 4440 0
Viewer 2 1 712 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000274992
Average time for zero size MPI_Send(): 1.67042e-05
#PETSc Option Table entries:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20141029/fb5041ed/attachment-0001.html>
-------------- next part --------------
P1:
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a arch-linux2-cxx-opt named nef017 with 64 processors, by bkhanal Tue Oct 28 05:52:24 2014
Using Petsc Release Version 3.4.3, Oct, 15, 2013
Max Max/Min Avg Total
Time (sec): 4.221e+04 1.00780 4.201e+04
Objects: 9.980e+02 1.00000 9.980e+02
Flops: 2.159e+11 1.08499 2.106e+11 1.348e+13
Flops/sec: 5.154e+06 1.08499 5.013e+06 3.208e+08
MPI Messages: 1.316e+05 3.69736 7.413e+04 4.744e+06
MPI Message Lengths: 1.986e+09 2.61387 1.581e+04 7.502e+10
MPI Reductions: 8.128e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.2010e+04 100.0% 1.3477e+13 100.0% 4.744e+06 100.0% 1.581e+04 100.0% 8.127e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecView 6 1.0 3.8704e+01424.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
VecMDot 3228 1.0 1.0614e+01 1.7 5.93e+09 1.1 0.0e+00 0.0e+00 3.2e+03 0 3 0 0 40 0 3 0 0 40 35184
VecNorm 4383 1.0 4.0579e+01 9.8 2.73e+09 1.1 0.0e+00 0.0e+00 4.4e+03 0 1 0 0 54 0 1 0 0 54 4239
VecScale 4680 1.0 1.2393e+00 1.3 1.39e+09 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 70518
VecCopy 1494 1.0 1.1592e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 5477 1.0 9.7614e+02288.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1113 1.0 7.9877e-01 2.3 6.17e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 48645
VecAYPX 375 1.0 7.9671e-02 1.4 4.99e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 39491
VecMAXPY 4317 1.0 7.3185e+00 2.1 8.89e+09 1.1 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 76569
VecScatterBegin 5133 1.0 6.7937e+00 2.4 0.00e+00 0.0 4.7e+06 1.5e+04 1.2e+01 0 0100 96 0 0 0100 96 0 0
VecScatterEnd 5121 1.0 2.5840e+02113.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 3984 1.0 7.2473e+00 1.6 3.92e+09 1.1 0.0e+00 0.0e+00 4.0e+03 0 2 0 0 49 0 2 0 0 49 34118
MatMult 910 1.0 7.5977e+03 1.0 2.11e+11 1.1 4.7e+06 1.5e+04 6.2e+03 18 98 99 92 76 18 98 99 92 76 1736
MatMultAdd 702 1.0 5.5198e+01 6.3 4.98e+09 1.1 6.6e+05 6.2e+03 0.0e+00 0 2 14 5 0 0 2 14 5 0 5624
MatConvert 6 1.0 3.3739e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatAssemblyBegin 55 1.0 5.7320e+0319.7 0.00e+00 0.0 0.0e+00 0.0e+00 6.2e+01 11 0 0 0 1 11 0 0 0 1 0
MatAssemblyEnd 55 1.0 1.6179e+00 1.3 0.00e+00 0.0 9.4e+03 3.6e+03 4.0e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 12 1.0 1.3590e-05 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 20 1.0 6.9117e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 18 1.0 4.7298e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+01 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 3228 1.0 1.4727e+01 1.7 1.19e+10 1.1 0.0e+00 0.0e+00 3.2e+03 0 6 0 0 40 0 6 0 0 40 50718
KSPSetUp 18 1.0 5.4130e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 6 1.0 3.2363e+04 1.0 2.15e+11 1.1 4.7e+06 1.5e+04 7.7e+03 77100 99 92 95 77100 99 92 95 415
PCSetUp 18 1.0 2.5724e+04 1.0 0.00e+00 0.0 7.7e+03 3.3e+04 2.0e+02 61 0 0 0 2 61 0 0 0 2 0
PCApply 18 1.0 3.2248e+04 1.0 2.12e+11 1.1 4.7e+06 1.5e+04 7.6e+03 77 98 99 91 93 77 98 99 91 93 411
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 859 849 1446200816 0
Vector Scatter 26 16 16352 0
Matrix 20 20 1832984916 0
Distributed Mesh 3 3 5457592 0
Bipartite Graph 6 6 4848 0
Index Set 61 61 1882648 0
IS L to G Mapping 5 5 3961468 0
Krylov Solver 5 5 57888 0
DMKSP interface 1 1 656 0
Preconditioner 5 5 4440 0
Viewer 7 6 4272 0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 5.20423e-06
#PETSc Option Table entries:
...
P2:
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a arch-linux2-cxx-opt named nef001 with 64 processors, by bkhanal Wed Oct 29 14:24:36 2014
Using Petsc Release Version 3.4.3, Oct, 15, 2013
Max Max/Min Avg Total
Time (sec): 1.958e+04 1.00194 1.955e+04
Objects: 3.190e+02 1.00000 3.190e+02
Flops: 3.638e+10 1.08499 3.548e+10 2.271e+12
Flops/sec: 1.861e+06 1.08676 1.815e+06 1.161e+08
MPI Messages: 2.253e+04 3.68455 1.270e+04 8.131e+05
MPI Message Lengths: 3.403e+08 2.51345 1.616e+04 1.314e+10
MPI Reductions: 1.544e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.9554e+04 100.0% 2.2709e+12 100.0% 8.131e+05 100.0% 1.616e+04 100.0% 1.543e+03 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecView 1 1.0 4.4869e+02189.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecMDot 544 1.0 1.8271e+01 2.2 1.00e+09 1.1 0.0e+00 0.0e+00 5.4e+02 0 3 0 0 35 0 3 0 0 35 3456
VecNorm 738 1.0 2.0433e+0218.1 4.60e+08 1.1 0.0e+00 0.0e+00 7.4e+02 1 1 0 0 48 1 1 0 0 48 142
VecScale 788 1.0 4.1195e+00 9.0 2.34e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 3573
VecCopy 251 1.0 7.6140e+0046.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 926 1.0 3.9087e+0141.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 187 1.0 6.0848e+0032.3 1.04e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1073
VecAYPX 63 1.0 4.6702e-0116.2 8.38e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1131
VecMAXPY 727 1.0 1.0997e+01 4.9 1.50e+09 1.1 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 8610
VecScatterBegin 864 1.0 2.0978e+0234.1 0.00e+00 0.0 8.0e+05 1.5e+04 2.0e+00 0 0 98 92 0 0 0 98 92 0 0
VecScatterEnd 862 1.0 5.4781e+02114.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecNormalize 671 1.0 1.6922e+01 2.2 6.61e+08 1.1 0.0e+00 0.0e+00 6.7e+02 0 2 0 0 43 0 2 0 0 43 2461
MatMult 152 1.0 6.3271e+03 1.0 3.56e+10 1.1 7.9e+05 1.5e+04 1.0e+03 32 98 98 89 68 32 98 98 89 68 351
MatMultAdd 118 1.0 4.5234e+02183.7 8.36e+08 1.1 1.1e+05 6.2e+03 0.0e+00 1 2 14 5 0 1 2 14 5 0 115
MatConvert 1 1.0 3.6065e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatAssemblyBegin 10 1.0 1.0849e+03 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 5 0 0 0 1 5 0 0 0 1 0
MatAssemblyEnd 10 1.0 1.3957e+01 1.1 0.00e+00 0.0 9.4e+03 3.6e+03 4.0e+01 0 0 1 0 3 0 0 1 0 3 0
MatGetRowIJ 2 1.0 2.2221e-03582.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 1.0 3.7378e-01 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 544 1.0 2.0370e+01 1.7 2.00e+09 1.1 0.0e+00 0.0e+00 5.4e+02 0 6 0 0 35 0 6 0 0 35 6200
KSPSetUp 3 1.0 4.2598e+01 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 1 1.0 1.5113e+04 1.0 3.63e+10 1.1 7.9e+05 1.5e+04 1.3e+03 77100 98 89 84 77100 98 89 84 150
PCSetUp 3 1.0 1.1794e+04 1.0 0.00e+00 0.0 7.7e+03 3.3e+04 1.3e+02 60 0 1 2 8 60 0 1 2 8 0
PCApply 3 1.0 1.4940e+04 1.0 3.58e+10 1.1 7.9e+05 1.5e+04 1.3e+03 76 98 97 88 83 76 98 97 88 83 149
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 215 215 743212688 0
Vector Scatter 16 16 16352 0
Matrix 20 20 1832984916 0
Distributed Mesh 3 3 5457592 0
Bipartite Graph 6 6 4848 0
Index Set 41 41 1867368 0
IS L to G Mapping 5 5 3961468 0
Krylov Solver 5 5 57888 0
DMKSP interface 1 1 656 0
Preconditioner 5 5 4440 0
Viewer 2 1 712 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000274992
Average time for zero size MPI_Send(): 1.67042e-05
#PETSc Option Table entries:
...
More information about the petsc-users
mailing list