[petsc-users] HPCToolKit/HPCViewer on OS X

Matthew Knepley knepley at gmail.com
Thu Jan 14 07:24:54 CST 2016


On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu
> wrote:

>
>
> On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> Can you mail us a -log_summary for a rough cut? Sometimes its hard
> to interpret the data avalanche from one of those tools without a simple
> map.
>
>
> Does this indicate some hot spots?
>

1) There is a misspelled option -stokes_ib_pc_level_ksp_richardson_self_scae

You can try to avoid this by giving -options_left

2) Are you using any custom code during the solve? There is a gaping whole
in the timing. It take 9s to
    do PCApply(), but something like a collective 1s to do everything we
time under that.

Since this is serial, we can use something like kcachegrind to look at
performance as well, which should
at least tell us what is sucking up this time so we can put a PETSc even on
it.

  Thanks,

     Matt


>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1 processor,
> by Taylor Wed Jan 13 21:07:43 2016
> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46  GIT Date:
> 2015-11-16 13:07:08 -0600
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           1.039e+01      1.00000   1.039e+01
> Objects:              2.834e+03      1.00000   2.834e+03
> Flops:                3.552e+08      1.00000   3.552e+08  3.552e+08
> Flops/sec:            3.418e+07      1.00000   3.418e+07  3.418e+07
> Memory:               3.949e+07      1.00000              3.949e+07
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00      0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 1.0391e+01 100.0%  3.5520e+08 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
>
>
>       ##########################################################
>       #                                                        #
>       #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was compiled with a debugging option,      #
>       #   To get timing results run ./configure                #
>       #   using --with-debugging=no, the performance will      #
>       #   be generally two or three times faster.              #
>       #                                                        #
>       ##########################################################
>
>
> Event                Count      Time (sec)     Flops
>       --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecDot                 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0    37
> VecMDot              533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0   375
> VecNorm              412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   388
> VecScale             331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  6  0  0  0  0   6  0  0  0  0     2
> VecCopy              116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet             18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY              254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  4  1  0  0  0   4  1  0  0  0     4
> VecAYPX               92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   106
> VecAXPBYCZ            36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   341
> VecWAXPY              58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
> VecMAXPY             638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0   328
> VecSwap              111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> VecAssemblyBegin     607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd       607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin    26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
> VecNormalize         260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  5  1  0  0  0   5  1  0  0  0     8
> BuildTwoSidedF       600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatMult              365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  6 18  0  0  0   6 18  0  0  0   104
> MatSolve            8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  7 63  0  0  0   7 63  0  0  0   328
> MatLUFactorSym        85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatLUFactorNum        85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  1 12  0  0  0   1 12  0  0  0   350
> MatScale               4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   625
> MatAssemblyBegin     108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd       108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRow          33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatGetRowIJ           85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetSubMatrice       4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering        85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAXPY                4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
> MatPtAP                4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  1  0  0  0   1  1  0  0  0    44
> MatPtAPSymbolic        4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatPtAPNumeric         4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   103
> MatGetSymTrans         4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPGMRESOrthog       182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   249
> KSPSetUp              90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
> PCSetUp               90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  4 12  0  0  0   4 12  0  0  0   104
> PCSetUpOnBlocks       91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  3 12  0  0  0   3 12  0  0  0   141
> PCApply               13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 87 98  0  0  0  87 98  0  0  0    39
> SNESSolve              1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
> SNESFunctionEval       2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     4
> SNESJacobianEval       1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Vector   870            762     13314200     0.
>       Vector Scatter   290            289       189584     0.
>            Index Set  1171            823       951096     0.
>    IS L to G Mapping   110            109      2156656     0.
>    Application Order     6              6        99952     0.
>              MatMFFD     1              1          776     0.
>               Matrix   189            189     24202324     0.
>    Matrix Null Space     4              4         2432     0.
>        Krylov Solver    90             90       190080     0.
>      DMKSP interface     1              1          648     0.
>       Preconditioner    90             90        89128     0.
>                 SNES     1              1         1328     0.
>       SNESLineSearch     1              1          856     0.
>               DMSNES     1              1          664     0.
>     Distributed Mesh     2              2         9024     0.
> Star Forest Bipartite Graph     4              4         3168     0.
>      Discrete System     2              2         1696     0.
>               Viewer     1              0            0     0.
>
> ========================================================================================================================
> Average time to get PetscTime(): 4.74e-08
> #PETSc Option Table entries:
> -ib_ksp_converged_reason
> -ib_ksp_monitor_true_residual
> -ib_snes_type ksponly
> -log_summary
> -stokes_ib_pc_level_ksp_richardson_self_scae
> -stokes_ib_pc_level_ksp_type gmres
> -stokes_ib_pc_level_pc_asm_local_type additive
> -stokes_ib_pc_level_pc_asm_type interpolate
> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
> -stokes_ib_pc_level_sub_pc_type lu
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90
> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1
> --with-hypre=1 --download-hypre=1 --with-hdf5=yes
> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/
> -----------------------------------------
> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu
> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit
> Using PETSc directory:
> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc
> Using PETSc arch: darwin-dbg
> -----------------------------------------
>
> Using C compiler: mpicc    -g  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90   -g   ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths:
> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include
> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpif90
> Using libraries:
> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
> -lpetsc
> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
> -lclang_rt.osx -lmpicxx -lc++
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11
> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib
> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl
> -lcrypto -lmpifort -lgfortran
> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran
> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx
> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
> -lclang_rt.osx -ldl
> -----------------------------------------
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/30fe6e7a/attachment-0001.html>


More information about the petsc-users mailing list