[petsc-users] Petsc Profiling for Each Function
Sharp Stone
thronesf at gmail.com
Wed Nov 23 18:04:20 CST 2016
Thanks Barry for the reply. When I use the option -log_view, I got the
Petsc API function profiling posted as below. But I wonder is there any way
for me to get the performance/execution profiling of the functions in the
code, like in Xcode profiling, so that I would know which part needs
special optimization? I now badly needs your instructions. Thank you very
much in advance!
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./pStreamer on a arch-darwin-c-opt named rcc-dhcp-249-233.sr.unh.edu with 4
processors, by GPL Wed Nov 23 18:49:59 2016
Using Petsc Release Version 3.5.4, May, 23, 2015
Max Max/Min Avg Total
Time (sec): 1.026e+02 1.00004 1.026e+02
Objects: 3.420e+02 1.00000 3.420e+02
Flops: 2.509e+10 1.00200 2.506e+10 1.002e+11
Flops/sec: 2.445e+08 1.00202 2.441e+08 9.764e+08
MPI Messages: 1.387e+04 1.99971 1.041e+04 4.163e+04
MPI Message Lengths: 2.783e+07 1.99971 2.006e+03 8.350e+07
MPI Reductions: 1.425e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 1.0264e+02 100.0% 1.0022e+11 100.0% 4.163e+04
100.0% 2.006e+03 100.0% 1.425e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec)
Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDot 6778 1.0 5.0633e+00 1.1 1.70e+09 1.0 0.0e+00 0.0e+00
6.8e+03 5 7 0 0 48 5 7 0 0 48 1345
VecDotNorm2 3389 1.0 3.1751e+00 1.3 1.70e+09 1.0 0.0e+00 0.0e+00
3.4e+03 3 7 0 0 24 3 7 0 0 24 2144
VecNorm 3571 1.0 2.1366e+00 1.1 8.98e+08 1.0 0.0e+00 0.0e+00
3.6e+03 2 4 0 0 25 2 4 0 0 25 1679
VecScale 49 1.0 1.2606e-02 1.2 6.16e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1952
VecCopy 157 1.0 4.5638e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 7140 1.0 5.1525e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 146 1.0 7.0036e-02 1.2 3.97e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 2266
VecAXPBYCZ 6778 1.0 4.9426e+00 1.2 3.41e+09 1.0 0.0e+00 0.0e+00
0.0e+00 4 14 0 0 0 4 14 0 0 0 2755
VecWAXPY 6785 1.0 3.9687e+00 1.0 1.71e+09 1.0 0.0e+00 0.0e+00
0.0e+00 4 7 0 0 0 4 7 0 0 0 1716
VecScatterBegin 6918 1.0 3.4263e-01 1.7 0.00e+00 0.0 4.2e+04 2.0e+03
0.0e+00 0 0100100 0 0 0100100 0 0
VecScatterEnd 6918 1.0 8.6652e+0029.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 4 0 0 0 0 4 0 0 0 0 0
MatMult 6869 1.0 2.6045e+01 1.5 7.76e+09 1.0 4.1e+04 2.0e+03
0.0e+00 21 31 99 99 0 21 31 99 99 0 1190
MatSolve 6960 1.0 1.7961e+01 1.0 7.86e+09 1.0 0.0e+00 0.0e+00
0.0e+00 17 31 0 0 0 17 31 0 0 0 1747
MatLUFactorNum 7 1.0 5.3501e-02 1.1 9.62e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 719
MatILUFactorSym 7 1.0 5.1423e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 14 1.0 3.7722e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
2.8e+01 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 14 1.0 6.4600e-02 1.0 0.00e+00 0.0 8.4e+01 5.0e+02
5.6e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 7 1.0 3.4150e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 7 1.0 5.8947e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 14 1.0 3.9437e+00132.6 0.00e+00 0.0 0.0e+00 0.0e+00
8.4e+01 2 0 0 0 1 2 0 0 0 1 0
KSPSolve 91 1.0 6.6894e+01 1.2 2.51e+10 1.0 4.1e+04 2.0e+03
1.4e+04 59100 99 99 98 59100 99 99 98 1497
PCSetUp 14 1.0 1.1119e-01 1.1 9.62e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 346
PCSetUpOnBlocks 91 1.0 1.1087e-01 1.1 9.62e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 347
PCApply 6960 1.0 1.8604e+01 1.0 7.86e+09 1.0 0.0e+00 0.0e+00
0.0e+00 18 31 0 0 0 18 31 0 0 0 1687
TSStep 6 1.0 7.0114e+01 1.0 2.01e+10 1.0 3.3e+04 2.0e+03
1.1e+04 68 80 80 80 78 68 80 80 80 78 1146
TSFunctionEval 6 1.0 7.0102e+01 1.0 2.01e+10 1.0 3.3e+04 2.0e+03
1.1e+04 68 80 80 80 78 68 80 80 80 78 1146
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 212 207 180438384 0
Vector Scatter 9 8 1014488 0
Matrix 28 28
147732508 0
Distributed Mesh 4 3 13776 0
Star Forest Bipartite Graph 8 6 4800 0
Discrete System 4 3 2376 0
Index Set 39 39 8087676 0
IS L to G Mapping 2 1 504596 0
Krylov Solver 15 15 17456 0
Preconditioner 15 15 14208 0
TSAdapt 1 1 1208 0
TS 1 1
1168 0
DMTS 1 1 712 0
SNES 1 1 1336 0
DMSNES 1 1 672 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 3.82e-08
Average time for MPI_Barrier(): 6.2797e-05
Average time for zero size MPI_Send(): 1.98967e-05
#PETSc Option Table entries:
-log_view
#End of PETSc Option Table entries
On Wed, Oct 19, 2016 at 8:33 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> To start you should just run the code with the ./configure option
> --with-debugging=0 with the option -log_view this will give you the high
> view of where it is spending the time. Feel free to email back the output.
> From that you can focus on what parts are both taking a lot of time AND
> running slowly and that gives a good idea of what needs to be optimized.
>
> Barry
>
> > On Oct 19, 2016, at 3:15 PM, Sharp Stone <thronesf at gmail.com> wrote:
> >
> > Dear all,
> >
> > Now I'm using a Petsc code which needs to be optimized. But after
> trying, I still don't know how to get the profiling for each of the
> function for each process. I mean, for each process, how should I know the
> execution time for each function?
> >
> > Thanks!
> >
> > --
> > Best regards,
> >
> > Feng
>
>
--
Best regards,
Feng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161123/d7d04711/attachment-0001.html>
More information about the petsc-users
mailing list