[petsc-users] HPCToolKit/HPCViewer on OS X

Griffith, Boyce Eugene boyceg at email.unc.edu
Thu Jan 14 14:01:10 CST 2016


> On Jan 14, 2016, at 2:24 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> 
>   Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. 
> 
>   From the output we have:
> 
>    Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time)
>    Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve)
>    Time to set up the preconditioner is 19%  (10 + 9)  
>    Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve())
> 
>     So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???)
> 
>    Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results).

Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first.

-- Boyce

> 
> 
>   Barry
> 
>> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>> 
>> 
>> 
>>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
>>> 
>>> I see one hot spot:
>> 
>> 
>> Here is with opt build
>> 
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
>> ************************************************************************************************************************
>> 
>> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>> 
>> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016
>> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 2016-01-13 21:30:26 -0600
>> 
>>                         Max       Max/Min        Avg      Total 
>> Time (sec):           1.018e+00      1.00000   1.018e+00
>> Objects:              2.935e+03      1.00000   2.935e+03
>> Flops:                4.957e+08      1.00000   4.957e+08  4.957e+08
>> Flops/sec:            4.868e+08      1.00000   4.868e+08  4.868e+08
>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Reductions:       0.000e+00      0.00000
>> 
>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>>                            e.g., VecAXPY() for real vectors of length N --> 2N flops
>>                            and VecAXPY() for complex vectors of length N --> 8N flops
>> 
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>>                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
>> 0:      Main Stage: 1.0183e+00 100.0%  4.9570e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 
>> 
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on interpreting output.
>> Phase summary info:
>>   Count: number of times phase was executed
>>   Time and Flops: Max - maximum over all processors
>>                   Ratio - ratio of maximum to minimum over all processors
>>   Mess: number of messages sent
>>   Avg. len: average message length (bytes)
>>   Reduct: number of global reductions
>>   Global: entire computation
>>   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>>      %T - percent time in this phase         %F - percent flops in this phase
>>      %M - percent messages in this phase     %L - percent message lengths in this phase
>>      %R - percent reductions in this phase
>>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> --- Event Stage 0: Main Stage
>> 
>> VecDot                 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1120
>> VecDotNorm2          272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  2920
>> VecMDot              624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   627
>> VecNorm              565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1151
>> VecScale              86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   279
>> VecCopy               28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet             14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecAXPY              903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1550
>> VecAYPX              225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   681
>> VecAXPBYCZ            42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2014
>> VecWAXPY              70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1528
>> VecMAXPY             641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   475
>> VecSwap              135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyBegin     745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd       745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecScatterBegin    40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>> BuildTwoSidedF       738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMult              513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9 16  0  0  0   9 16  0  0  0   849
>> MatSolve           13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70  0  0  0  23 70  0  0  0  1460
>> MatLUFactorSym        84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>> MatLUFactorNum        85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  8  0  0  0   4  8  0  0  0  1058
>> MatILUFactorSym        1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatScale               4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   984
>> MatAssemblyBegin     108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd       108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetRow          33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> MatGetRowIJ           85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetSubMatrice       4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatGetOrdering        85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatAXPY                4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
>> MatPtAP                4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0  0   112
>> MatPtAPSymbolic        4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>> MatPtAPNumeric         4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   298
>> MatGetSymTrans         4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPGMRESOrthog        16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> KSPSetUp              90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99  0  0  0  77 99  0  0  0   622
>> PCSetUp               90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10  8  0  0  0  10  8  0  0  0   420
>> PCSetUpOnBlocks      112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9  8  0  0  0   9  8  0  0  0   479
>> PCApply               16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99  0  0  0  71 99  0  0  0   680
>> SNESSolve              1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99  0  0  0  78 99  0  0  0   619
>> SNESFunctionEval       2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    14
>> SNESJacobianEval       1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     9
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> Memory usage is given in bytes:
>> 
>> Object Type          Creations   Destructions     Memory  Descendants' Mem.
>> Reports information only for process 0.
>> 
>> --- Event Stage 0: Main Stage
>> 
>>              Vector   971            839     15573352     0.
>>      Vector Scatter   290            289       189584     0.
>>           Index Set  1171            823       951928     0.
>>   IS L to G Mapping   110            109      2156656     0.
>>   Application Order     6              6        99952     0.
>>             MatMFFD     1              1          776     0.
>>              Matrix   189            189     24083332     0.
>>   Matrix Null Space     4              4         2432     0.
>>       Krylov Solver    90             90       122720     0.
>>     DMKSP interface     1              1          648     0.
>>      Preconditioner    90             90        89872     0.
>>                SNES     1              1         1328     0.
>>      SNESLineSearch     1              1          984     0.
>>              DMSNES     1              1          664     0.
>>    Distributed Mesh     2              2         9168     0.
>> Star Forest Bipartite Graph     4              4         3168     0.
>>     Discrete System     2              2         1712     0.
>>              Viewer     1              0            0     0.
>> ========================================================================================================================
>> Average time to get PetscTime(): 9.53674e-07
>> #PETSc Option Table entries:
>> -ib_ksp_converged_reason
>> -ib_ksp_monitor_true_residual
>> -ib_snes_type ksponly
>> -log_summary
>> -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal
>> -stokes_ib_pc_level_0_sub_pc_type ilu
>> -stokes_ib_pc_level_ksp_richardson_self_scale
>> -stokes_ib_pc_level_ksp_type richardson
>> -stokes_ib_pc_level_pc_asm_local_type additive
>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
>> -stokes_ib_pc_level_sub_pc_type lu
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
>> -----------------------------------------
>> Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta 
>> Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty
>> Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc
>> Using PETSc arch: linux-opt
>> -----------------------------------------
>> 
>> Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3  ${COPTFLAGS} ${CFLAGS}
>> Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3   ${FOPTFLAGS} ${FFLAGS} 
>> -----------------------------------------
>> 
>> Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include
>> -----------------------------------------
>> 
>> Using C linker: mpicc
>> Using Fortran linker: mpif90
>> Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl 
>> -----------------------------------------



More information about the petsc-users mailing list