[petsc-users] Log_summary
Barry Smith
bsmith at mcs.anl.gov
Fri Oct 21 17:58:05 CDT 2011
Sorry for the confusing output but it is not a bug. The time given on the third column is the slowest time over all the processes; the next number is the ratio of fastest to slowest which is 2.4 hence the fastest process took 1.6966e+02/2.4 seconds. Meanwhile the percent is computed by ADDING the times over all the processes per event and dividing by the total time. Since some processes were much faster than others the percentage time for all processes is much lower than you would expect given the maximum time.
The fact that the ratio is 2.4 indicates that some processes have many more non-zeros in their part of the matrix than other processes, i.e. load inbalance
Barry
On Oct 21, 2011, at 5:09 PM, Rongliang Chen wrote:
> Hello All,
>
> I got a strange thing in my log_summary output (see below). In the output file, the time spend on the function "MatLUFactorNum" is "1.6966e+02" but the percent time in this phase (T%) for this function is 20%, which means that the total time should be 8.483e+02. But the exact total time is 5.506e+02. So what kind of bug can cause this problem? Thanks,
>
> Best,
> Rongliang
>
> Max Max/Min Avg Total
> Time (sec): 5.506e+02 1.00014 5.506e+02
>
> .........
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatLUFactorNum 20 1.0 1.6966e+02 2.4 1.94e+11 3.2 0.0e+00 0.0e+00 0.0e+00 20 62 0 0 0 20 62 0 0 0 19505
>
> ===========================================================================
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./joab on a Janus-nod named node1755 with 32 processors, by ronglian Sun Oct 16 16:34:25 2011
> Using Petsc Release Version 3.2.0, Patch 1, Mon Sep 12 16:01:51 CDT 2011
>
> Max Max/Min Avg Total
> Time (sec): 5.506e+02 1.00014 5.506e+02
> Objects: 9.800e+02 1.00000 9.800e+02
> Flops: 2.639e+11 2.37776 1.673e+11 5.353e+12
> Flops/sec: 4.794e+08 2.37786 3.038e+08 9.722e+09
> MPI Messages: 1.057e+05 5.06348 3.760e+04 1.203e+06
> MPI Message Lengths: 1.191e+09 2.60674 2.113e+04 2.543e+10
> MPI Reductions: 6.213e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 5.5061e+02 100.0% 5.3531e+12 100.0% 1.203e+06 100.0% 2.113e+04 100.0% 6.212e+03 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> --- Event Stage 0: Main Stage
>
> MatMult 2589 1.0 5.0895e+01 7.9 4.45e+09 1.1 4.1e+05 2.2e+03 0.0e+00 5 3 34 4 0 5 3 34 4 0 2655
> MatMultTranspose 5 1.0 1.5848e-02 1.3 6.15e+06 1.1 6.6e+02 1.5e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 11882
> MatSolve 2562 1.0 1.1045e+02 1.6 6.03e+10 1.7 0.0e+00 0.0e+00 0.0e+00 17 29 0 0 0 17 29 0 0 0 14238
> MatLUFactorSym 4 1.0 2.0894e+00 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
> MatLUFactorNum 20 1.0 1.6966e+02 2.4 1.94e+11 3.2 0.0e+00 0.0e+00 0.0e+00 20 62 0 0 0 20 62 0 0 0 19505
> MatILUFactorSym 1 1.0 4.8680e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 110 1.0 1.6247e+0119.7 0.00e+00 0.0 1.7e+04 6.7e+04 1.8e+02 2 0 1 5 3 2 0 1 5 3 0
> MatAssemblyEnd 110 1.0 1.5937e+02 1.0 0.00e+00 0.0 3.0e+03 5.3e+02 1.7e+02 29 0 0 0 3 29 0 0 0 3 0
> MatGetRowIJ 5 1.0 3.0132e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetSubMatrice 20 1.0 2.9429e+00 1.4 0.00e+00 0.0 1.8e+04 3.9e+05 8.0e+01 0 0 1 27 1 0 0 1 27 1 0
> MatGetOrdering 5 1.0 8.6869e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 0 0 0 0 0 0
> MatIncreaseOvrlp 1 1.0 3.3869e-02 1.0 0.00e+00 0.0 1.6e+03 1.3e+03 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatPartitioning 1 1.0 1.1711e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 45 1.0 9.1191e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecDot 19 1.0 2.4045e-0213.2 6.91e+05 1.1 0.0e+00 0.0e+00 1.9e+01 0 0 0 0 0 0 0 0 0 0 879
> VecMDot 2518 1.0 1.7926e+01 2.5 5.33e+09 1.1 0.0e+00 0.0e+00 2.5e+03 2 3 0 0 41 2 3 0 0 41 9107
> VecNorm 2601 1.0 3.0213e+00 5.2 1.23e+08 1.1 0.0e+00 0.0e+00 2.6e+03 0 0 0 0 42 0 0 0 0 42 1243
> VecScale 2562 1.0 5.3008e-02 1.4 6.06e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 35014
> VecCopy 166 1.0 1.3387e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 5223 1.0 4.8880e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 142 1.0 1.1994e-02 1.7 6.89e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 17581
> VecWAXPY 20 1.0 2.2526e-03 1.6 3.50e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4755
> VecMAXPY 2562 1.0 7.9590e+00 1.4 5.45e+09 1.1 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 20969
> VecAssemblyBegin 70 1.0 1.4910e-01 4.0 0.00e+00 0.0 5.0e+03 5.6e+02 2.1e+02 0 0 0 0 3 0 0 0 0 3 0
> VecAssemblyEnd 70 1.0 2.8467e-04 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 7779 1.0 1.1640e+00 1.6 0.00e+00 0.0 1.2e+06 1.5e+04 0.0e+00 0 0 96 68 0 0 0 96 68 0 0
> VecScatterEnd 7779 1.0 1.4521e+0235.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 0
> VecReduceArith 8 1.0 8.0991e-04 1.1 3.05e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 11525
> VecReduceComm 4 1.0 3.4404e-04 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecNormalize 2562 1.0 2.9302e+00 4.9 1.81e+08 1.1 0.0e+00 0.0e+00 2.5e+03 0 0 0 0 41 0 0 0 0 41 1893
> SNESSolve 4 1.0 5.3863e+02 1.0 2.61e+11 2.4 9.9e+05 2.5e+04 4.3e+03 98 98 82 96 69 98 98 82 96 69 9778
> SNESLineSearch 19 1.0 8.7346e+00 1.0 6.45e+07 1.1 1.9e+04 1.9e+04 3.5e+02 2 0 2 1 6 2 0 2 1 6 225
> SNESFunctionEval 24 1.0 6.6409e+01 1.0 3.06e+07 1.1 2.0e+04 2.1e+04 3.6e+02 12 0 2 2 6 12 0 2 2 6 14
> SNESJacobianEval 19 1.0 1.6340e+02 1.0 0.00e+00 0.0 1.1e+04 6.9e+04 1.5e+02 30 0 1 3 2 30 0 1 3 2 0
> KSPGMRESOrthog 2518 1.0 2.5557e+01 1.7 1.07e+10 1.1 0.0e+00 0.0e+00 2.5e+03 3 6 0 0 41 3 6 0 0 41 12776
> KSPSetup 40 1.0 2.5885e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 20 1.0 3.1525e+02 1.0 2.64e+11 2.4 1.2e+06 2.1e+04 5.2e+03 57100 96 95 84 57100 96 95 84 16973
> PCSetUp 40 1.0 1.7504e+02 2.4 1.94e+11 3.2 2.1e+04 3.3e+05 1.6e+02 21 62 2 27 3 21 62 2 27 3 18906
> PCSetUpOnBlocks 20 1.0 1.7249e+02 2.4 1.94e+11 3.2 0.0e+00 0.0e+00 2.7e+01 21 62 0 0 0 21 62 0 0 0 19186
> PCApply 2562 1.0 2.0489e+02 2.0 6.03e+10 1.7 7.4e+05 2.2e+04 0.0e+00 28 29 62 65 0 28 29 62 65 0 7675
> ------------------------------------------------------------------------------------------------------------------------
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 44 34 2570073620 0
> Matrix Partitioning 1 1 640 0
> Index Set 189 182 3025696 0
> IS L to G Mapping 2 1 283604 0
> Vector 688 452 114894176 0
> Vector Scatter 29 25 26300 0
> Application Order 2 2 9335968 0
> SNES 4 2 2544 0
> Krylov Solver 10 6 11645040 0
> Preconditioner 10 6 5456 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 1.19209e-07
> Average time for MPI_Barrier(): 1.02043e-05
> Average time for zero size MPI_Send(): 2.90573e-06
> #PETSc Option Table entries:
> -coarse_ksp_rtol 1.0e-1
> -coarsegrid /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E2000_N8241_D70170.fsi
> -computeinitialguess
> -f /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E32000_N128961_D1096650.fsi
> -geometric_asm
> -geometric_asm_overlap 8
> -inletu 5.0
> -ksp_atol 1e-8
> -ksp_gmres_restart 600
> -ksp_max_it 3000
> -ksp_pc_side right
> -ksp_rtol 1.e-2
> -ksp_type gmres
> -log_summary
> -mat_partitioning_type parmetis
> -nest_geometric_asm_overlap 4
> -nest_ksp_atol 1e-8
> -nest_ksp_gmres_restart 800
> -nest_ksp_max_it 1000
> -nest_ksp_pc_side right
> -nest_ksp_rtol 1.e-2
> -nest_ksp_type gmres
> -nest_pc_asm_type basic
> -nest_pc_type asm
> -nest_snes_atol 1.e-10
> -nest_snes_max_it 20
> -nest_snes_rtol 1.e-4
> -nest_sub_pc_factor_mat_ordering_type qmd
> -nest_sub_pc_factor_shift_amount 1e-8
> -sub_pc_factor_shift_type nonzero
> -sub_pc_type lu
> -viscosity 0.01
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
> Configure run at: Tue Sep 13 13:28:48 2011
> Configure options: --known-level1-dcache-size=32768 --known-level1-dcache-linesize=32 --known-level1-dcache-assoc=0 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=8 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-batch=1 --with-mpi-shared-libraries=1 --known-mpi-shared-libraries=0 --download-f-blas-lapack=1 --download-hypre=1 --download-superlu=1 --download-parmetis=1 --download-superlu_dist=1 --download-blacs=1 --download-scalapack=1 --download-mumps=1 --with-debugging=0
> -----------------------------------------
> Libraries compiled on Tue Sep 13 13:28:48 2011 on node1367
> Machine characteristics: Linux-2.6.18-238.12.1.el5-x86_64-with-redhat-5.6-Tikanga
> Using PETSc directory: /home/ronglian/soft/petsc-3.2-p1
> Using PETSc arch: Janus-nodebug
> -----------------------------------------
>
> Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90 -Wall -Wno-unused-variable -O ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/include -I/home/ronglian/soft/petsc-3.2-p1/include -I/home/ronglian/soft/petsc-3.2-p1/include -I/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/include -I/curc/tools/free/redhat_5_x86_64/openmpi-1.4.3_ib/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpif90
> Using libraries: -Wl,-rpath,/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -L/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -lpetsc -lX11 -Wl,-rpath,/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -L/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -lsuperlu_dist_2.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -lmpi_cxx -lstdc++ -lscalapack -lblacs -lsuperlu_4.2 -lflapack -lfblas -L/curc/tools/free/redhat_5_x86_64/openmpi-1.4.3_ib/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
> -----------------------------------------
>
More information about the petsc-users
mailing list