[petsc-users] Petsc Profiling for Each Function

Sharp Stone thronesf at gmail.com
Wed Nov 23 18:04:20 CST 2016


Thanks Barry for the reply. When I use the option -log_view, I got the
Petsc API function profiling posted as below. But I wonder is there any way
for me to get the performance/execution profiling of the functions in the
code, like in Xcode profiling, so that I would know which part needs
special optimization? I now badly needs your instructions. Thank you very
much in advance!


---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./pStreamer on a arch-darwin-c-opt named rcc-dhcp-249-233.sr.unh.edu with 4
processors, by GPL Wed Nov 23 18:49:59 2016
Using Petsc Release Version 3.5.4, May, 23, 2015

                         Max       Max/Min        Avg      Total
Time (sec):           1.026e+02      1.00004   1.026e+02
Objects:              3.420e+02      1.00000   3.420e+02
Flops:                2.509e+10      1.00200   2.506e+10  1.002e+11
Flops/sec:            2.445e+08      1.00202   2.441e+08  9.764e+08
MPI Messages:         1.387e+04      1.99971   1.041e+04  4.163e+04
MPI Message Lengths:  2.783e+07      1.99971   2.006e+03  8.350e+07
MPI Reductions:       1.425e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 1.0264e+02 100.0%  1.0022e+11 100.0%  4.163e+04
100.0%  2.006e+03      100.0%  1.425e+04 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot              6778 1.0 5.0633e+00 1.1 1.70e+09 1.0 0.0e+00 0.0e+00
6.8e+03  5  7  0  0 48   5  7  0  0 48  1345
VecDotNorm2         3389 1.0 3.1751e+00 1.3 1.70e+09 1.0 0.0e+00 0.0e+00
3.4e+03  3  7  0  0 24   3  7  0  0 24  2144
VecNorm             3571 1.0 2.1366e+00 1.1 8.98e+08 1.0 0.0e+00 0.0e+00
3.6e+03  2  4  0  0 25   2  4  0  0 25  1679
VecScale              49 1.0 1.2606e-02 1.2 6.16e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  1952
VecCopy              157 1.0 4.5638e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              7140 1.0 5.1525e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              146 1.0 7.0036e-02 1.2 3.97e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  2266
VecAXPBYCZ          6778 1.0 4.9426e+00 1.2 3.41e+09 1.0 0.0e+00 0.0e+00
0.0e+00  4 14  0  0  0   4 14  0  0  0  2755
VecWAXPY            6785 1.0 3.9687e+00 1.0 1.71e+09 1.0 0.0e+00 0.0e+00
0.0e+00  4  7  0  0  0   4  7  0  0  0  1716
VecScatterBegin     6918 1.0 3.4263e-01 1.7 0.00e+00 0.0 4.2e+04 2.0e+03
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       6918 1.0 8.6652e+0029.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  4  0  0  0  0   4  0  0  0  0     0
MatMult             6869 1.0 2.6045e+01 1.5 7.76e+09 1.0 4.1e+04 2.0e+03
0.0e+00 21 31 99 99  0  21 31 99 99  0  1190
MatSolve            6960 1.0 1.7961e+01 1.0 7.86e+09 1.0 0.0e+00 0.0e+00
0.0e+00 17 31  0  0  0  17 31  0  0  0  1747
MatLUFactorNum         7 1.0 5.3501e-02 1.1 9.62e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   719
MatILUFactorSym        7 1.0 5.1423e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      14 1.0 3.7722e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
2.8e+01  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        14 1.0 6.4600e-02 1.0 0.00e+00 0.0 8.4e+01 5.0e+02
5.6e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            7 1.0 3.4150e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         7 1.0 5.8947e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp              14 1.0 3.9437e+00132.6 0.00e+00 0.0 0.0e+00 0.0e+00
8.4e+01  2  0  0  0  1   2  0  0  0  1     0
KSPSolve              91 1.0 6.6894e+01 1.2 2.51e+10 1.0 4.1e+04 2.0e+03
1.4e+04 59100 99 99 98  59100 99 99 98  1497
PCSetUp               14 1.0 1.1119e-01 1.1 9.62e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   346
PCSetUpOnBlocks       91 1.0 1.1087e-01 1.1 9.62e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   347
PCApply             6960 1.0 1.8604e+01 1.0 7.86e+09 1.0 0.0e+00 0.0e+00
0.0e+00 18 31  0  0  0  18 31  0  0  0  1687
TSStep                 6 1.0 7.0114e+01 1.0 2.01e+10 1.0 3.3e+04 2.0e+03
1.1e+04 68 80 80 80 78  68 80 80 80 78  1146
TSFunctionEval         6 1.0 7.0102e+01 1.0 2.01e+10 1.0 3.3e+04 2.0e+03
1.1e+04 68 80 80 80 78  68 80 80 80 78  1146
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector                   212            207    180438384     0
      Vector Scatter                  9              8      1014488     0
              Matrix                      28             28
147732508     0
    Distributed Mesh                 4              3        13776     0
Star Forest Bipartite Graph     8              6         4800     0
     Discrete System                 4              3         2376     0
           Index Set                   39             39      8087676     0
   IS L to G Mapping              2              1       504596     0
       Krylov Solver                15             15        17456     0
      Preconditioner               15             15        14208     0
             TSAdapt                  1              1         1208     0
                  TS                       1              1
1168     0
                DMTS                   1              1          712     0
                SNES                    1              1         1336     0
              DMSNES                1              1          672     0
              Viewer                    1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 3.82e-08
Average time for MPI_Barrier(): 6.2797e-05
Average time for zero size MPI_Send(): 1.98967e-05
#PETSc Option Table entries:
-log_view
#End of PETSc Option Table entries

On Wed, Oct 19, 2016 at 8:33 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>    To start you should just run the code with the ./configure option
> --with-debugging=0 with the option -log_view this will give you the high
> view of where it is spending the time. Feel free to email back the output.
> From that you can focus on what parts are both taking a lot of time AND
> running slowly and that gives a good idea of what needs to be optimized.
>
>    Barry
>
> > On Oct 19, 2016, at 3:15 PM, Sharp Stone <thronesf at gmail.com> wrote:
> >
> > Dear all,
> >
> > Now I'm using a Petsc code which needs to be optimized. But after
> trying, I still don't know how to get the profiling for each of the
> function for each process. I mean, for each process, how should I know the
> execution time for each function?
> >
> > Thanks!
> >
> > --
> > Best regards,
> >
> > Feng
>
>


-- 
Best regards,

Feng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161123/d7d04711/attachment-0001.html>


More information about the petsc-users mailing list