[petsc-users] HPCToolKit/HPCViewer on OS X

Matthew Knepley knepley at gmail.com
Thu Jan 14 07:44:47 CST 2016


On Thu, Jan 14, 2016 at 7:37 AM, Dave May <dave.mayhem23 at gmail.com> wrote:

>
>
> On 14 January 2016 at 14:24, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <
>> amneetb at live.unc.edu> wrote:
>>
>>>
>>>
>>> On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>
>>> Can you mail us a -log_summary for a rough cut? Sometimes its hard
>>> to interpret the data avalanche from one of those tools without a simple
>>> map.
>>>
>>>
>>> Does this indicate some hot spots?
>>>
>>
>> 1) There is a misspelled option -stokes_ib_pc_level_ksp_
>> richardson_self_scae
>>
>> You can try to avoid this by giving -options_left
>>
>> 2) Are you using any custom code during the solve? There is a gaping
>> whole in the timing. It take 9s to
>>     do PCApply(), but something like a collective 1s to do everything we
>> time under that.
>>
>
>
> You are looking at the timing from a debug build.
> The results from the optimized build don't have such a gaping hole.
>

It still looks like 50% of the runtime to me.

   Matt


>
>> Since this is serial, we can use something like kcachegrind to look at
>> performance as well, which should
>> at least tell us what is sucking up this time so we can put a PETSc even
>> on it.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>
>>>
>>> ************************************************************************************************************************
>>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>>> -fCourier9' to print this document            ***
>>>
>>> ************************************************************************************************************************
>>>
>>> ---------------------------------------------- PETSc Performance
>>> Summary: ----------------------------------------------
>>>
>>> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1
>>> processor, by Taylor Wed Jan 13 21:07:43 2016
>>> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46  GIT Date:
>>> 2015-11-16 13:07:08 -0600
>>>
>>>                          Max       Max/Min        Avg      Total
>>> Time (sec):           1.039e+01      1.00000   1.039e+01
>>> Objects:              2.834e+03      1.00000   2.834e+03
>>> Flops:                3.552e+08      1.00000   3.552e+08  3.552e+08
>>> Flops/sec:            3.418e+07      1.00000   3.418e+07  3.418e+07
>>> Memory:               3.949e+07      1.00000              3.949e+07
>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>> MPI Reductions:       0.000e+00      0.00000
>>>
>>> Flop counting convention: 1 flop = 1 real number operation of type
>>> (multiply/divide/add/subtract)
>>>                             e.g., VecAXPY() for real vectors of length N
>>> --> 2N flops
>>>                             and VecAXPY() for complex vectors of length
>>> N --> 8N flops
>>>
>>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
>>> ---  -- Message Lengths --  -- Reductions --
>>>                         Avg     %Total     Avg     %Total   counts
>>> %Total     Avg         %Total   counts   %Total
>>>  0:      Main Stage: 1.0391e+01 100.0%  3.5520e+08 100.0%  0.000e+00
>>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> See the 'Profiling' chapter of the users' manual for details on
>>> interpreting output.
>>> Phase summary info:
>>>    Count: number of times phase was executed
>>>    Time and Flops: Max - maximum over all processors
>>>                    Ratio - ratio of maximum to minimum over all
>>> processors
>>>    Mess: number of messages sent
>>>    Avg. len: average message length (bytes)
>>>    Reduct: number of global reductions
>>>    Global: entire computation
>>>    Stage: stages of a computation. Set stages with PetscLogStagePush()
>>> and PetscLogStagePop().
>>>       %T - percent time in this phase         %F - percent flops in this
>>> phase
>>>       %M - percent messages in this phase     %L - percent message
>>> lengths in this phase
>>>       %R - percent reductions in this phase
>>>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>>> over all processors)
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>       ##########################################################
>>>       #                                                        #
>>>       #                          WARNING!!!                    #
>>>       #                                                        #
>>>       #   This code was compiled with a debugging option,      #
>>>       #   To get timing results run ./configure                #
>>>       #   using --with-debugging=no, the performance will      #
>>>       #   be generally two or three times faster.              #
>>>       #                                                        #
>>>       ##########################################################
>>>
>>>
>>> Event                Count      Time (sec)     Flops
>>>         --- Global ---  --- Stage ---   Total
>>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> VecDot                 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0    37
>>> VecMDot              533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   375
>>> VecNorm              412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   388
>>> VecScale             331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  6  0  0  0  0   6  0  0  0  0     2
>>> VecCopy              116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecSet             18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> VecAXPY              254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  4  1  0  0  0   4  1  0  0  0     4
>>> VecAYPX               92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   106
>>> VecAXPBYCZ            36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   341
>>> VecWAXPY              58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
>>> VecMAXPY             638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   328
>>> VecSwap              111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> VecAssemblyBegin     607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyEnd       607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin    26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>>> VecNormalize         260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  5  1  0  0  0   5  1  0  0  0     8
>>> BuildTwoSidedF       600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatMult              365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  6 18  0  0  0   6 18  0  0  0   104
>>> MatSolve            8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  7 63  0  0  0   7 63  0  0  0   328
>>> MatLUFactorSym        85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatLUFactorNum        85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  1 12  0  0  0   1 12  0  0  0   350
>>> MatScale               4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   625
>>> MatAssemblyBegin     108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd       108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetRow          33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatGetRowIJ           85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetSubMatrice       4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetOrdering        85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAXPY                4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>>> MatPtAP                4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  1  0  0  0   1  1  0  0  0    44
>>> MatPtAPSymbolic        4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatPtAPNumeric         4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   103
>>> MatGetSymTrans         4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPGMRESOrthog       182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   249
>>> KSPSetUp              90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSolve               1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
>>> PCSetUp               90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  4 12  0  0  0   4 12  0  0  0   104
>>> PCSetUpOnBlocks       91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  3 12  0  0  0   3 12  0  0  0   141
>>> PCApply               13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 87 98  0  0  0  87 98  0  0  0    39
>>> SNESSolve              1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
>>> SNESFunctionEval       2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     4
>>> SNESJacobianEval       1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> Memory usage is given in bytes:
>>>
>>> Object Type          Creations   Destructions     Memory  Descendants'
>>> Mem.
>>> Reports information only for process 0.
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>>               Vector   870            762     13314200     0.
>>>       Vector Scatter   290            289       189584     0.
>>>            Index Set  1171            823       951096     0.
>>>    IS L to G Mapping   110            109      2156656     0.
>>>    Application Order     6              6        99952     0.
>>>              MatMFFD     1              1          776     0.
>>>               Matrix   189            189     24202324     0.
>>>    Matrix Null Space     4              4         2432     0.
>>>        Krylov Solver    90             90       190080     0.
>>>      DMKSP interface     1              1          648     0.
>>>       Preconditioner    90             90        89128     0.
>>>                 SNES     1              1         1328     0.
>>>       SNESLineSearch     1              1          856     0.
>>>               DMSNES     1              1          664     0.
>>>     Distributed Mesh     2              2         9024     0.
>>> Star Forest Bipartite Graph     4              4         3168     0.
>>>      Discrete System     2              2         1696     0.
>>>               Viewer     1              0            0     0.
>>>
>>> ========================================================================================================================
>>> Average time to get PetscTime(): 4.74e-08
>>> #PETSc Option Table entries:
>>> -ib_ksp_converged_reason
>>> -ib_ksp_monitor_true_residual
>>> -ib_snes_type ksponly
>>> -log_summary
>>> -stokes_ib_pc_level_ksp_richardson_self_scae
>>> -stokes_ib_pc_level_ksp_type gmres
>>> -stokes_ib_pc_level_pc_asm_local_type additive
>>> -stokes_ib_pc_level_pc_asm_type interpolate
>>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
>>> -stokes_ib_pc_level_sub_pc_type lu
>>> #End of PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90
>>> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1
>>> --with-hypre=1 --download-hypre=1 --with-hdf5=yes
>>> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/
>>> -----------------------------------------
>>> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu
>>> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit
>>> Using PETSc directory:
>>> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc
>>> Using PETSc arch: darwin-dbg
>>> -----------------------------------------
>>>
>>> Using C compiler: mpicc    -g  ${COPTFLAGS} ${CFLAGS}
>>> Using Fortran compiler: mpif90   -g   ${FOPTFLAGS} ${FFLAGS}
>>> -----------------------------------------
>>>
>>> Using include paths:
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>>> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include
>>> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include
>>> -----------------------------------------
>>>
>>> Using C linker: mpicc
>>> Using Fortran linker: mpif90
>>> Using libraries:
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -lpetsc
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>>> -lclang_rt.osx -lmpicxx -lc++
>>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl
>>> -lcrypto -lmpifort -lgfortran
>>> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>>> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>>> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran
>>> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem
>>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -lclang_rt.osx -ldl
>>> -----------------------------------------
>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/ef951fcc/attachment-0001.html>


More information about the petsc-users mailing list