[petsc-users] HPCToolKit/HPCViewer on OS X

Dave May dave.mayhem23 at gmail.com
Thu Jan 14 07:37:11 CST 2016


On 14 January 2016 at 14:24, Matthew Knepley <knepley at gmail.com> wrote:

> On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <
> amneetb at live.unc.edu> wrote:
>
>>
>>
>> On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> Can you mail us a -log_summary for a rough cut? Sometimes its hard
>> to interpret the data avalanche from one of those tools without a simple
>> map.
>>
>>
>> Does this indicate some hot spots?
>>
>
> 1) There is a misspelled option -stokes_ib_pc_level_ksp_
> richardson_self_scae
>
> You can try to avoid this by giving -options_left
>
> 2) Are you using any custom code during the solve? There is a gaping whole
> in the timing. It take 9s to
>     do PCApply(), but something like a collective 1s to do everything we
> time under that.
>


You are looking at the timing from a debug build.
The results from the optimized build don't have such a gaping hole.



>
> Since this is serial, we can use something like kcachegrind to look at
> performance as well, which should
> at least tell us what is sucking up this time so we can put a PETSc even
> on it.
>
>   Thanks,
>
>      Matt
>
>
>
>>
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document            ***
>>
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance Summary:
>> ----------------------------------------------
>>
>> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1
>> processor, by Taylor Wed Jan 13 21:07:43 2016
>> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46  GIT Date:
>> 2015-11-16 13:07:08 -0600
>>
>>                          Max       Max/Min        Avg      Total
>> Time (sec):           1.039e+01      1.00000   1.039e+01
>> Objects:              2.834e+03      1.00000   2.834e+03
>> Flops:                3.552e+08      1.00000   3.552e+08  3.552e+08
>> Flops/sec:            3.418e+07      1.00000   3.418e+07  3.418e+07
>> Memory:               3.949e+07      1.00000              3.949e+07
>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Reductions:       0.000e+00      0.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>                             e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>>                             and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
>> ---  -- Message Lengths --  -- Reductions --
>>                         Avg     %Total     Avg     %Total   counts
>> %Total     Avg         %Total   counts   %Total
>>  0:      Main Stage: 1.0391e+01 100.0%  3.5520e+08 100.0%  0.000e+00
>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>    Count: number of times phase was executed
>>    Time and Flops: Max - maximum over all processors
>>                    Ratio - ratio of maximum to minimum over all processors
>>    Mess: number of messages sent
>>    Avg. len: average message length (bytes)
>>    Reduct: number of global reductions
>>    Global: entire computation
>>    Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>>       %T - percent time in this phase         %F - percent flops in this
>> phase
>>       %M - percent messages in this phase     %L - percent message
>> lengths in this phase
>>       %R - percent reductions in this phase
>>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>>
>>       ##########################################################
>>       #                                                        #
>>       #                          WARNING!!!                    #
>>       #                                                        #
>>       #   This code was compiled with a debugging option,      #
>>       #   To get timing results run ./configure                #
>>       #   using --with-debugging=no, the performance will      #
>>       #   be generally two or three times faster.              #
>>       #                                                        #
>>       ##########################################################
>>
>>
>> Event                Count      Time (sec)     Flops
>>         --- Global ---  --- Stage ---   Total
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecDot                 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0    37
>> VecMDot              533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   375
>> VecNorm              412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   388
>> VecScale             331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  6  0  0  0  0   6  0  0  0  0     2
>> VecCopy              116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet             18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecAXPY              254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  4  1  0  0  0   4  1  0  0  0     4
>> VecAYPX               92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   106
>> VecAXPBYCZ            36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   341
>> VecWAXPY              58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
>> VecMAXPY             638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   328
>> VecSwap              111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> VecAssemblyBegin     607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd       607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecScatterBegin    26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>> VecNormalize         260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  5  1  0  0  0   5  1  0  0  0     8
>> BuildTwoSidedF       600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMult              365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  6 18  0  0  0   6 18  0  0  0   104
>> MatSolve            8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  7 63  0  0  0   7 63  0  0  0   328
>> MatLUFactorSym        85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatLUFactorNum        85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1 12  0  0  0   1 12  0  0  0   350
>> MatScale               4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   625
>> MatAssemblyBegin     108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd       108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetRow          33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatGetRowIJ           85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetSubMatrice       4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetOrdering        85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAXPY                4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>> MatPtAP                4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1  1  0  0  0   1  1  0  0  0    44
>> MatPtAPSymbolic        4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatPtAPNumeric         4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   103
>> MatGetSymTrans         4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPGMRESOrthog       182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   249
>> KSPSetUp              90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
>> PCSetUp               90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  4 12  0  0  0   4 12  0  0  0   104
>> PCSetUpOnBlocks       91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  3 12  0  0  0   3 12  0  0  0   141
>> PCApply               13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 87 98  0  0  0  87 98  0  0  0    39
>> SNESSolve              1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
>> SNESFunctionEval       2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     4
>> SNESJacobianEval       1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>> Reports information only for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>>               Vector   870            762     13314200     0.
>>       Vector Scatter   290            289       189584     0.
>>            Index Set  1171            823       951096     0.
>>    IS L to G Mapping   110            109      2156656     0.
>>    Application Order     6              6        99952     0.
>>              MatMFFD     1              1          776     0.
>>               Matrix   189            189     24202324     0.
>>    Matrix Null Space     4              4         2432     0.
>>        Krylov Solver    90             90       190080     0.
>>      DMKSP interface     1              1          648     0.
>>       Preconditioner    90             90        89128     0.
>>                 SNES     1              1         1328     0.
>>       SNESLineSearch     1              1          856     0.
>>               DMSNES     1              1          664     0.
>>     Distributed Mesh     2              2         9024     0.
>> Star Forest Bipartite Graph     4              4         3168     0.
>>      Discrete System     2              2         1696     0.
>>               Viewer     1              0            0     0.
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 4.74e-08
>> #PETSc Option Table entries:
>> -ib_ksp_converged_reason
>> -ib_ksp_monitor_true_residual
>> -ib_snes_type ksponly
>> -log_summary
>> -stokes_ib_pc_level_ksp_richardson_self_scae
>> -stokes_ib_pc_level_ksp_type gmres
>> -stokes_ib_pc_level_pc_asm_local_type additive
>> -stokes_ib_pc_level_pc_asm_type interpolate
>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
>> -stokes_ib_pc_level_sub_pc_type lu
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90
>> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1
>> --with-hypre=1 --download-hypre=1 --with-hdf5=yes
>> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/
>> -----------------------------------------
>> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu
>> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit
>> Using PETSc directory:
>> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc
>> Using PETSc arch: darwin-dbg
>> -----------------------------------------
>>
>> Using C compiler: mpicc    -g  ${COPTFLAGS} ${CFLAGS}
>> Using Fortran compiler: mpif90   -g   ${FOPTFLAGS} ${FFLAGS}
>> -----------------------------------------
>>
>> Using include paths:
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include
>> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include
>> -----------------------------------------
>>
>> Using C linker: mpicc
>> Using Fortran linker: mpif90
>> Using libraries:
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -lpetsc
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>> -lclang_rt.osx -lmpicxx -lc++
>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib
>> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl
>> -lcrypto -lmpifort -lgfortran
>> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran
>> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem
>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -lclang_rt.osx -ldl
>> -----------------------------------------
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/1153c83d/attachment-0001.html>


More information about the petsc-users mailing list