[petsc-users] HPCToolKit/HPCViewer on OS X
Matthew Knepley
knepley at gmail.com
Thu Jan 14 07:44:47 CST 2016
On Thu, Jan 14, 2016 at 7:37 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>
>
> On 14 January 2016 at 14:24, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <
>> amneetb at live.unc.edu> wrote:
>>
>>>
>>>
>>> On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>
>>> Can you mail us a -log_summary for a rough cut? Sometimes its hard
>>> to interpret the data avalanche from one of those tools without a simple
>>> map.
>>>
>>>
>>> Does this indicate some hot spots?
>>>
>>
>> 1) There is a misspelled option -stokes_ib_pc_level_ksp_
>> richardson_self_scae
>>
>> You can try to avoid this by giving -options_left
>>
>> 2) Are you using any custom code during the solve? There is a gaping
>> whole in the timing. It take 9s to
>> do PCApply(), but something like a collective 1s to do everything we
>> time under that.
>>
>
>
> You are looking at the timing from a debug build.
> The results from the optimized build don't have such a gaping hole.
>
It still looks like 50% of the runtime to me.
Matt
>
>> Since this is serial, we can use something like kcachegrind to look at
>> performance as well, which should
>> at least tell us what is sucking up this time so we can put a PETSc even
>> on it.
>>
>> Thanks,
>>
>> Matt
>>
>>
>>
>>>
>>> ************************************************************************************************************************
>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
>>> -fCourier9' to print this document ***
>>>
>>> ************************************************************************************************************************
>>>
>>> ---------------------------------------------- PETSc Performance
>>> Summary: ----------------------------------------------
>>>
>>> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1
>>> processor, by Taylor Wed Jan 13 21:07:43 2016
>>> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46 GIT Date:
>>> 2015-11-16 13:07:08 -0600
>>>
>>> Max Max/Min Avg Total
>>> Time (sec): 1.039e+01 1.00000 1.039e+01
>>> Objects: 2.834e+03 1.00000 2.834e+03
>>> Flops: 3.552e+08 1.00000 3.552e+08 3.552e+08
>>> Flops/sec: 3.418e+07 1.00000 3.418e+07 3.418e+07
>>> Memory: 3.949e+07 1.00000 3.949e+07
>>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
>>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
>>> MPI Reductions: 0.000e+00 0.00000
>>>
>>> Flop counting convention: 1 flop = 1 real number operation of type
>>> (multiply/divide/add/subtract)
>>> e.g., VecAXPY() for real vectors of length N
>>> --> 2N flops
>>> and VecAXPY() for complex vectors of length
>>> N --> 8N flops
>>>
>>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
>>> --- -- Message Lengths -- -- Reductions --
>>> Avg %Total Avg %Total counts
>>> %Total Avg %Total counts %Total
>>> 0: Main Stage: 1.0391e+01 100.0% 3.5520e+08 100.0% 0.000e+00
>>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> See the 'Profiling' chapter of the users' manual for details on
>>> interpreting output.
>>> Phase summary info:
>>> Count: number of times phase was executed
>>> Time and Flops: Max - maximum over all processors
>>> Ratio - ratio of maximum to minimum over all
>>> processors
>>> Mess: number of messages sent
>>> Avg. len: average message length (bytes)
>>> Reduct: number of global reductions
>>> Global: entire computation
>>> Stage: stages of a computation. Set stages with PetscLogStagePush()
>>> and PetscLogStagePop().
>>> %T - percent time in this phase %F - percent flops in this
>>> phase
>>> %M - percent messages in this phase %L - percent message
>>> lengths in this phase
>>> %R - percent reductions in this phase
>>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>>> over all processors)
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> ##########################################################
>>> # #
>>> # WARNING!!! #
>>> # #
>>> # This code was compiled with a debugging option, #
>>> # To get timing results run ./configure #
>>> # using --with-debugging=no, the performance will #
>>> # be generally two or three times faster. #
>>> # #
>>> ##########################################################
>>>
>>>
>>> Event Count Time (sec) Flops
>>> --- Global --- --- Stage --- Total
>>> Max Ratio Max Ratio Max Ratio Mess Avg len
>>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> VecDot 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 37
>>> VecMDot 533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 375
>>> VecNorm 412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 388
>>> VecScale 331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 6 0 0 0 0 6 0 0 0 0 2
>>> VecCopy 116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecSet 18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>>> VecAXPY 254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 4 1 0 0 0 4 1 0 0 0 4
>>> VecAYPX 92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 106
>>> VecAXPBYCZ 36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 341
>>> VecWAXPY 58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 197
>>> VecMAXPY 638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 328
>>> VecSwap 111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
>>> VecAssemblyBegin 607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecAssemblyEnd 607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecScatterBegin 26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
>>> VecNormalize 260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 5 1 0 0 0 5 1 0 0 0 8
>>> BuildTwoSidedF 600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatMult 365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 6 18 0 0 0 6 18 0 0 0 104
>>> MatSolve 8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 7 63 0 0 0 7 63 0 0 0 328
>>> MatLUFactorSym 85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>>> MatLUFactorNum 85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 12 0 0 0 1 12 0 0 0 350
>>> MatScale 4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 625
>>> MatAssemblyBegin 108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatAssemblyEnd 108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetRow 33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>>> MatGetRowIJ 85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetSubMatrice 4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetOrdering 85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatAXPY 4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
>>> MatPtAP 4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 1 0 0 0 1 1 0 0 0 44
>>> MatPtAPSymbolic 4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>>> MatPtAPNumeric 4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 103
>>> MatGetSymTrans 4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> KSPGMRESOrthog 182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 249
>>> KSPSetUp 90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> KSPSolve 1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 92 98 0 0 0 92 98 0 0 0 37
>>> PCSetUp 90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 4 12 0 0 0 4 12 0 0 0 104
>>> PCSetUpOnBlocks 91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 3 12 0 0 0 3 12 0 0 0 141
>>> PCApply 13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 87 98 0 0 0 87 98 0 0 0 39
>>> SNESSolve 1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 92 98 0 0 0 92 98 0 0 0 37
>>> SNESFunctionEval 2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 4
>>> SNESJacobianEval 1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> Memory usage is given in bytes:
>>>
>>> Object Type Creations Destructions Memory Descendants'
>>> Mem.
>>> Reports information only for process 0.
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> Vector 870 762 13314200 0.
>>> Vector Scatter 290 289 189584 0.
>>> Index Set 1171 823 951096 0.
>>> IS L to G Mapping 110 109 2156656 0.
>>> Application Order 6 6 99952 0.
>>> MatMFFD 1 1 776 0.
>>> Matrix 189 189 24202324 0.
>>> Matrix Null Space 4 4 2432 0.
>>> Krylov Solver 90 90 190080 0.
>>> DMKSP interface 1 1 648 0.
>>> Preconditioner 90 90 89128 0.
>>> SNES 1 1 1328 0.
>>> SNESLineSearch 1 1 856 0.
>>> DMSNES 1 1 664 0.
>>> Distributed Mesh 2 2 9024 0.
>>> Star Forest Bipartite Graph 4 4 3168 0.
>>> Discrete System 2 2 1696 0.
>>> Viewer 1 0 0 0.
>>>
>>> ========================================================================================================================
>>> Average time to get PetscTime(): 4.74e-08
>>> #PETSc Option Table entries:
>>> -ib_ksp_converged_reason
>>> -ib_ksp_monitor_true_residual
>>> -ib_snes_type ksponly
>>> -log_summary
>>> -stokes_ib_pc_level_ksp_richardson_self_scae
>>> -stokes_ib_pc_level_ksp_type gmres
>>> -stokes_ib_pc_level_pc_asm_local_type additive
>>> -stokes_ib_pc_level_pc_asm_type interpolate
>>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
>>> -stokes_ib_pc_level_sub_pc_type lu
>>> #End of PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90
>>> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1
>>> --with-hypre=1 --download-hypre=1 --with-hdf5=yes
>>> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/
>>> -----------------------------------------
>>> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu
>>> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit
>>> Using PETSc directory:
>>> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc
>>> Using PETSc arch: darwin-dbg
>>> -----------------------------------------
>>>
>>> Using C compiler: mpicc -g ${COPTFLAGS} ${CFLAGS}
>>> Using Fortran compiler: mpif90 -g ${FOPTFLAGS} ${FFLAGS}
>>> -----------------------------------------
>>>
>>> Using include paths:
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>>> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include
>>> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include
>>> -----------------------------------------
>>>
>>> Using C linker: mpicc
>>> Using Fortran linker: mpif90
>>> Using libraries:
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -lpetsc
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>>> -lclang_rt.osx -lmpicxx -lc++
>>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl
>>> -lcrypto -lmpifort -lgfortran
>>> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>>> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>>> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran
>>> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem
>>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -lclang_rt.osx -ldl
>>> -----------------------------------------
>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/ef951fcc/attachment-0001.html>
More information about the petsc-users
mailing list