[petsc-users] HPCToolKit/HPCViewer on OS X
Dave May
dave.mayhem23 at gmail.com
Thu Jan 14 07:37:11 CST 2016
On 14 January 2016 at 14:24, Matthew Knepley <knepley at gmail.com> wrote:
> On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <
> amneetb at live.unc.edu> wrote:
>
>>
>>
>> On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> Can you mail us a -log_summary for a rough cut? Sometimes its hard
>> to interpret the data avalanche from one of those tools without a simple
>> map.
>>
>>
>> Does this indicate some hot spots?
>>
>
> 1) There is a misspelled option -stokes_ib_pc_level_ksp_
> richardson_self_scae
>
> You can try to avoid this by giving -options_left
>
> 2) Are you using any custom code during the solve? There is a gaping whole
> in the timing. It take 9s to
> do PCApply(), but something like a collective 1s to do everything we
> time under that.
>
You are looking at the timing from a debug build.
The results from the optimized build don't have such a gaping hole.
>
> Since this is serial, we can use something like kcachegrind to look at
> performance as well, which should
> at least tell us what is sucking up this time so we can put a PETSc even
> on it.
>
> Thanks,
>
> Matt
>
>
>
>>
>> ************************************************************************************************************************
>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
>> -fCourier9' to print this document ***
>>
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance Summary:
>> ----------------------------------------------
>>
>> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1
>> processor, by Taylor Wed Jan 13 21:07:43 2016
>> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46 GIT Date:
>> 2015-11-16 13:07:08 -0600
>>
>> Max Max/Min Avg Total
>> Time (sec): 1.039e+01 1.00000 1.039e+01
>> Objects: 2.834e+03 1.00000 2.834e+03
>> Flops: 3.552e+08 1.00000 3.552e+08 3.552e+08
>> Flops/sec: 3.418e+07 1.00000 3.418e+07 3.418e+07
>> Memory: 3.949e+07 1.00000 3.949e+07
>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
>> MPI Reductions: 0.000e+00 0.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>> e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>> and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
>> --- -- Message Lengths -- -- Reductions --
>> Avg %Total Avg %Total counts
>> %Total Avg %Total counts %Total
>> 0: Main Stage: 1.0391e+01 100.0% 3.5520e+08 100.0% 0.000e+00
>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>> Count: number of times phase was executed
>> Time and Flops: Max - maximum over all processors
>> Ratio - ratio of maximum to minimum over all processors
>> Mess: number of messages sent
>> Avg. len: average message length (bytes)
>> Reduct: number of global reductions
>> Global: entire computation
>> Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>> %T - percent time in this phase %F - percent flops in this
>> phase
>> %M - percent messages in this phase %L - percent message
>> lengths in this phase
>> %R - percent reductions in this phase
>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>>
>> ##########################################################
>> # #
>> # WARNING!!! #
>> # #
>> # This code was compiled with a debugging option, #
>> # To get timing results run ./configure #
>> # using --with-debugging=no, the performance will #
>> # be generally two or three times faster. #
>> # #
>> ##########################################################
>>
>>
>> Event Count Time (sec) Flops
>> --- Global --- --- Stage --- Total
>> Max Ratio Max Ratio Max Ratio Mess Avg len
>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecDot 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 37
>> VecMDot 533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 375
>> VecNorm 412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 388
>> VecScale 331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 6 0 0 0 0 6 0 0 0 0 2
>> VecCopy 116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecSet 18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>> VecAXPY 254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 4 1 0 0 0 4 1 0 0 0 4
>> VecAYPX 92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 106
>> VecAXPBYCZ 36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 341
>> VecWAXPY 58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 197
>> VecMAXPY 638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 328
>> VecSwap 111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
>> VecAssemblyBegin 607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecAssemblyEnd 607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecScatterBegin 26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
>> VecNormalize 260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 5 1 0 0 0 5 1 0 0 0 8
>> BuildTwoSidedF 600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatMult 365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 6 18 0 0 0 6 18 0 0 0 104
>> MatSolve 8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 7 63 0 0 0 7 63 0 0 0 328
>> MatLUFactorSym 85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>> MatLUFactorNum 85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 1 12 0 0 0 1 12 0 0 0 350
>> MatScale 4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 625
>> MatAssemblyBegin 108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatAssemblyEnd 108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatGetRow 33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>> MatGetRowIJ 85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatGetSubMatrice 4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatGetOrdering 85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatAXPY 4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
>> MatPtAP 4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 1 1 0 0 0 1 1 0 0 0 44
>> MatPtAPSymbolic 4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>> MatPtAPNumeric 4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 103
>> MatGetSymTrans 4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPGMRESOrthog 182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 249
>> KSPSetUp 90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSolve 1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 92 98 0 0 0 92 98 0 0 0 37
>> PCSetUp 90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 4 12 0 0 0 4 12 0 0 0 104
>> PCSetUpOnBlocks 91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 3 12 0 0 0 3 12 0 0 0 141
>> PCApply 13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 87 98 0 0 0 87 98 0 0 0 39
>> SNESSolve 1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 92 98 0 0 0 92 98 0 0 0 37
>> SNESFunctionEval 2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 4
>> SNESJacobianEval 1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type Creations Destructions Memory Descendants'
>> Mem.
>> Reports information only for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>> Vector 870 762 13314200 0.
>> Vector Scatter 290 289 189584 0.
>> Index Set 1171 823 951096 0.
>> IS L to G Mapping 110 109 2156656 0.
>> Application Order 6 6 99952 0.
>> MatMFFD 1 1 776 0.
>> Matrix 189 189 24202324 0.
>> Matrix Null Space 4 4 2432 0.
>> Krylov Solver 90 90 190080 0.
>> DMKSP interface 1 1 648 0.
>> Preconditioner 90 90 89128 0.
>> SNES 1 1 1328 0.
>> SNESLineSearch 1 1 856 0.
>> DMSNES 1 1 664 0.
>> Distributed Mesh 2 2 9024 0.
>> Star Forest Bipartite Graph 4 4 3168 0.
>> Discrete System 2 2 1696 0.
>> Viewer 1 0 0 0.
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 4.74e-08
>> #PETSc Option Table entries:
>> -ib_ksp_converged_reason
>> -ib_ksp_monitor_true_residual
>> -ib_snes_type ksponly
>> -log_summary
>> -stokes_ib_pc_level_ksp_richardson_self_scae
>> -stokes_ib_pc_level_ksp_type gmres
>> -stokes_ib_pc_level_pc_asm_local_type additive
>> -stokes_ib_pc_level_pc_asm_type interpolate
>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
>> -stokes_ib_pc_level_sub_pc_type lu
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90
>> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1
>> --with-hypre=1 --download-hypre=1 --with-hdf5=yes
>> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/
>> -----------------------------------------
>> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu
>> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit
>> Using PETSc directory:
>> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc
>> Using PETSc arch: darwin-dbg
>> -----------------------------------------
>>
>> Using C compiler: mpicc -g ${COPTFLAGS} ${CFLAGS}
>> Using Fortran compiler: mpif90 -g ${FOPTFLAGS} ${FFLAGS}
>> -----------------------------------------
>>
>> Using include paths:
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include
>> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include
>> -----------------------------------------
>>
>> Using C linker: mpicc
>> Using Fortran linker: mpif90
>> Using libraries:
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -lpetsc
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>> -lclang_rt.osx -lmpicxx -lc++
>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib
>> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl
>> -lcrypto -lmpifort -lgfortran
>> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran
>> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem
>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -lclang_rt.osx -ldl
>> -----------------------------------------
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/1153c83d/attachment-0001.html>
More information about the petsc-users
mailing list