[petsc-users] HPCToolKit/HPCViewer on OS X

Bhalla, Amneet Pal S amneetb at live.unc.edu
Thu Jan 14 18:36:14 CST 2016


And the PETSc log summary for comparison

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 19:34:38 2016
Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 2016-01-13 21:30:26 -0600

                         Max       Max/Min        Avg      Total
Time (sec):           6.223e-01      1.00000   6.223e-01
Objects:              2.618e+03      1.00000   2.618e+03
Flops:                1.948e+08      1.00000   1.948e+08  1.948e+08
Flops/sec:            3.129e+08      1.00000   3.129e+08  3.129e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 6.2232e-01 100.0%  1.9476e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                 4 1.0 2.9087e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1139
VecDotNorm2          180 1.0 1.0626e-03 1.0 3.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  2983
VecMDot              288 1.0 3.8970e-03 1.0 2.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   611
VecNorm              113 1.0 9.6560e-04 1.0 1.36e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   140
VecScale              66 1.0 4.0913e-04 1.0 1.21e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   295
VecCopy               24 1.0 3.8338e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             12855 1.0 1.0173e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecAXPY              607 1.0 2.9583e-03 1.0 4.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  1680
VecAYPX              169 1.0 8.6975e-04 1.0 6.41e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   737
VecAXPBYCZ            34 1.0 1.1325e-04 1.0 2.77e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2443
VecWAXPY              54 1.0 1.4043e-04 1.0 2.30e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1637
VecMAXPY             301 1.0 5.6567e-03 1.0 2.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   421
VecSwap              103 1.0 3.2711e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin     561 1.0 3.5629e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAssemblyEnd       561 1.0 6.4468e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin    18427 1.0 1.5277e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
BuildTwoSidedF       554 1.0 1.9150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult              361 1.0 6.2765e-02 1.0 5.80e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 30  0  0  0  10 30  0  0  0   924
MatSolve            6108 1.0 6.3529e-02 1.0 9.53e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 49  0  0  0  10 49  0  0  0  1500
MatLUFactorSym        85 1.0 2.0353e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
MatLUFactorNum        85 1.0 2.2882e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4 12  0  0  0   4 12  0  0  0   995
MatScale               4 1.0 2.2912e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1096
MatAssemblyBegin     108 1.0 6.7949e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       108 1.0 2.9209e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRow          33120 1.0 2.0407e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
MatGetRowIJ           85 1.0 1.2467e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       4 1.0 8.2304e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetOrdering        85 1.0 7.8776e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatAXPY                4 1.0 4.9517e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     0
MatPtAP                4 1.0 4.4372e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  7  3  0  0  0   7  3  0  0  0   112
MatPtAPSymbolic        4 1.0 2.7586e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
MatPtAPNumeric         4 1.0 1.6756e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0   298
MatGetSymTrans         4 1.0 3.6120e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog        12 1.0 4.9458e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
KSPSetUp              90 1.0 5.6815e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.8819e-01 1.0 1.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 62 97  0  0  0  62 97  0  0  0   488
PCSetUp               90 1.0 6.4402e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 12  0  0  0  10 12  0  0  0   354
PCSetUpOnBlocks       84 1.0 5.2499e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00  8 12  0  0  0   8 12  0  0  0   434
PCApply               12 1.0 3.4369e-01 1.0 1.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 55 97  0  0  0  55 97  0  0  0   549
SNESSolve              1 1.0 3.9208e-01 1.0 1.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 63 97  0  0  0  63 97  0  0  0   483
SNESFunctionEval       2 1.0 3.2527e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    14
SNESJacobianEval       1 1.0 4.6706e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     9
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   739            639      9087400     0.
      Vector Scatter   290            289       189584     0.
           Index Set  1086            738       885136     0.
   IS L to G Mapping   110            109      2156656     0.
   Application Order     6              6        99952     0.
             MatMFFD     1              1          776     0.
              Matrix   189            189     19106368     0.
   Matrix Null Space     4              4         2432     0.
       Krylov Solver    90             90       122720     0.
     DMKSP interface     1              1          648     0.
      Preconditioner    90             90        89864     0.
                SNES     1              1         1328     0.
      SNESLineSearch     1              1          984     0.
              DMSNES     1              1          664     0.
    Distributed Mesh     2              2         9168     0.
Star Forest Bipartite Graph     4              4         3168     0.
     Discrete System     2              2         1712     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 7.15256e-07
#PETSc Option Table entries:
-ib_ksp_converged_reason
-ib_ksp_monitor_true_residual
-ib_snes_type ksponly
-log_summary
-stokes_ib_pc_level_ksp_richardson_self_scale
-stokes_ib_pc_level_ksp_type richardson
-stokes_ib_pc_level_pc_asm_local_type additive
-stokes_ib_pc_level_pc_asm_type interpolate
-stokes_ib_pc_level_sub_pc_factor_shift_type nonzero
-stokes_ib_pc_level_sub_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta
Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc
Using PETSc arch: linux-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl
-----------------------------------------


On Jan 14, 2016, at 4:31 PM, Bhalla, Amneet Pal Singh <amneetb at ad.unc.edu<mailto:amneetb at ad.unc.edu>> wrote:



On Jan 14, 2016, at 11:24 AM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:

Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results).

@Barry — Attached is the
<main2dProfiling.numbers>
output from HPCToolkit profiler for all the operations done in solving 1 timestep Stokes+IB simulation.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/48a75119/attachment-0001.html>


More information about the petsc-users mailing list