************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./v3d3 on a linux-gcc named hb-2r07-n30 with 64 processors, by wjiang Sun Apr 29 00:00:44 2012 Using Petsc Release Version 3.2.0, Patch 7, Thu Mar 15 09:30:51 CDT 2012 Max Max/Min Avg Total Time (sec): 2.852e+03 1.00059 2.851e+03 Objects: 3.540e+02 1.00000 3.540e+02 Flops: 2.299e+07 1.76894 2.148e+07 1.375e+09 Flops/sec: 8.064e+03 1.76894 7.535e+03 4.823e+05 MPI Messages: 3.981e+03 6.41063 8.549e+02 5.471e+04 MPI Message Lengths: 1.451e+09 1.49062 1.259e+06 6.887e+10 MPI Reductions: 5.900e+02 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.8507e+03 100.0% 1.3748e+09 100.0% 5.471e+04 100.0% 1.259e+06 100.0% 5.890e+02 99.8% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %f - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 6 1.0 1.5704e-01 2.6 4.05e+04 1.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 16 VecNorm 4 1.0 2.6608e-04 1.4 2.70e+04 1.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 1 6491 VecScale 22 1.0 6.4158e-04 4.4 7.42e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 7403 VecCopy 18 1.0 3.5477e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 26 1.0 1.7715e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 26 1.0 4.0007e-04 2.1 1.75e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 28060 VecWAXPY 2 1.0 7.7009e-05 3.3 6.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5607 VecAssemblyBegin 25 1.0 8.5720e+00 4.6 0.00e+00 0.0 1.3e+04 1.5e+05 7.5e+01 0 0 25 3 13 0 0 25 3 13 0 VecAssemblyEnd 25 1.0 1.3301e-0214.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecLoad 3 1.0 1.4726e-01 1.0 0.00e+00 0.0 1.9e+02 2.7e+04 9.0e+00 0 0 0 0 2 0 0 0 0 2 0 VecScatterBegin 82 1.0 2.5039e-01 1.3 0.00e+00 0.0 7.7e+03 4.4e+03 7.6e+01 0 0 14 0 13 0 0 14 0 13 0 VecScatterEnd 6 1.0 6.6810e-0250.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 2 1.0 7.0052e-02 8.6 4.53e+06 1.8 9.8e+02 2.4e+04 0.0e+00 0 20 2 0 0 0 20 2 0 0 3861 MatSolve 4 1.0 1.3398e+00 1.1 0.00e+00 0.0 2.0e+04 6.8e+02 2.0e+01 0 0 37 0 3 0 0 37 0 3 0 MatLUFactorSym 2 1.0 1.4763e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01 52 0 0 0 2 52 0 0 0 2 0 MatLUFactorNum 4 1.0 1.2632e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 44 0 0 0 1 44 0 0 0 1 0 MatCopy 6 1.0 3.7833e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 10 1.0 1.0276e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 2 0 0 0 0 2 0 MatScale 16 1.0 8.3069e-02 3.5 1.81e+07 1.8 0.0e+00 0.0e+00 0.0e+00 0 79 0 0 0 0 79 0 0 0 13045 MatAssemblyBegin 15 1.0 6.7555e+00 2.1 0.00e+00 0.0 1.3e+04 5.1e+06 3.0e+01 0 0 23 96 5 0 0 23 96 5 0 MatAssemblyEnd 15 1.0 1.0360e+01 1.1 0.00e+00 0.0 9.8e+02 5.9e+03 2.3e+01 0 0 2 0 4 0 0 2 0 4 0 MatZeroEntries 14 1.0 5.3977e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLoad 1 1.0 7.2302e+00 1.0 0.00e+00 0.0 1.2e+03 6.9e+05 1.2e+01 0 0 2 1 2 0 0 2 1 2 0 MatView 8 1.0 6.4199e-02 2.5 0.00e+00 0.0 6.0e+03 2.1e+03 4.0e+00 0 0 11 0 1 0 0 11 0 1 0 MatAXPY 12 1.0 9.5232e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetup 4 1.0 4.2915e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 4 1.0 2.7411e+03 1.0 0.00e+00 0.0 2.0e+04 6.8e+02 5.4e+01 96 0 37 0 9 96 0 37 0 9 0 PCSetUp 4 1.0 2.7397e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.4e+01 96 0 0 0 6 96 0 0 0 6 0 PCApply 4 1.0 1.3398e+00 1.0 0.00e+00 0.0 2.0e+04 6.8e+02 2.0e+01 0 0 37 0 3 0 0 37 0 3 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 120 120 129765760 0 Vector Scatter 89 89 61420 0 Matrix 45 45 127497324 0 PetscRandom 1 1 608 0 Krylov Solver 3 3 3376 0 Preconditioner 3 3 2664 0 Viewer 5 4 2880 0 Index Set 88 88 335480 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 2.96116e-05 Average time for zero size MPI_Send(): 1.63913e-06 #PETSc Option Table entries: -ksp_view -log_summary -mat_mumps_icntl_14 50 -mat_mumps_icntl_4 1 -mat_mumps_icntl_6 2 -mat_mumps_icntl_7 5 -pc_factor_mat_solver_package mumps -pc_type lu #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Apr 18 11:20:27 2012 Configure options: PETSC_ARCH=linux-gcc-mvapich2-cxx --with-blas-lapack-lib="[/software/libraries/GotoBLAS_LAPACK/liblapack-gcc.a,libgoto2-gcc.a]" --with-mpi-lib="[/software/tools/mvapich2-1.6-gcc/lib/libmpich.a,libfmpich.a,libmpichcxx.a,libmpichf90.a]" --with-mpi-include=/software/tools/mvapich2-1.6-gcc/include --with-x=0 --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-clanguage=cxx --with-debugging=0 --download-superlu=1 --download-superlu_dist=1 --download-parmetis=1 --download-parmetis=1 --download-mumps=1 --download-f-blas-lapack=1 --download-blacs=1 -download-scalapack=1 ----------------------------------------- Libraries compiled on Wed Apr 18 11:20:27 2012 on lg-1r14-n03 Machine characteristics: Linux-2.6.18-238.19.1.el5-x86_64-with-redhat-5.6-Final Using PETSc directory: /software/libraries/petsc-3.2 Using PETSc arch: linux-gcc-mvapich2-cxx ----------------------------------------- Using C compiler: g++ -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: gfortran -Wall -Wno-unused-variable -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/software/libraries/petsc-3.2/linux-gcc-mvapich2-cxx/include -I/software/libraries/petsc-3.2/include -I/software/libraries/petsc-3.2/include -I/software/libraries/petsc-3.2/linux-gcc-mvapich2-cxx/include -I/software/tools/mvapich2-1.6-gcc/include ----------------------------------------- Using C linker: g++ Using Fortran linker: gfortran Using libraries: -Wl,-rpath,/software/libraries/petsc-3.2/linux-gcc-mvapich2-cxx/lib -L/software/libraries/petsc-3.2/linux-gcc-mvapich2-cxx/lib -lpetsc -lpthread -Wl,-rpath,/software/libraries/petsc-3.2/linux-gcc-mvapich2-cxx/lib -L/software/libraries/petsc-3.2/linux-gcc-mvapich2-cxx/lib -lsuperlu_dist_2.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lsuperlu_4.2 -lflapack -lfblas -Wl,-rpath,/software/tools/mvapich2-1.6-gcc/lib -L/software/tools/mvapich2-1.6-gcc/lib -lmpich -lfmpich -lmpichcxx -lmpichf90 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lgcc_s -lgfortran -lm -lm -lstdc++ -lstdc++ -ldl -lgcc_s -ldl ----------------------------------------- ****** ANALYSIS STEP ******** ** Max-trans not allowed because matrix is distributed ... Structural symmetry (in percent)= 100 Density: NBdense, Average, Median = 0 313 342 Ordering based on METIS A root of estimated size 14076 has been selected for Scalapack. Leaving analysis phase with ... INFOG(1) = 0 INFOG(2) = 0 -- (20) Number of entries in factors (estim.) = 1767798047 -- (3) Storage of factors (REAL, estimated) = 1788547958 -- (4) Storage of factors (INT , estimated) = 7720270 -- (5) Maximum frontal size (estimated) = 14076 -- (6) Number of nodes in the tree = 2088 -- (32) Type of analysis effectively used = 1 -- (7) Ordering option effectively used = 5 ICNTL(6) Maximum transversal option = 0 ICNTL(7) Pivot order option = 5 Percentage of memory relaxation (effective) = 55 Number of level 2 nodes = 121 Number of split nodes = 16 RINFOG(1) Operations during elimination (estim)= 1.281D+13 Distributed matrix entry format (ICNTL(18)) = 3 ** Rank of proc needing largest memory in IC facto : 50 ** Estimated corresponding MBYTES for IC facto : 894 ** Estimated avg. MBYTES per work. proc at facto (IC) : 710 ** TOTAL space in MBYTES for IC factorization : 45480 ** Rank of proc needing largest memory for OOC facto : 44 ** Estimated corresponding MBYTES for OOC facto : 709 ** Estimated avg. MBYTES per work. proc at facto (OOC) : 557 ** TOTAL space in MBYTES for OOC factorization : 35667 Entering DMUMPS driver with JOB, N, NZ = 2 215883 67725225 ****** FACTORIZATION STEP ******** GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... NUMBER OF WORKING PROCESSES = 64 OUT-OF-CORE OPTION (ICNTL(22)) = 0 REAL SPACE FOR FACTORS = 1788547958 INTEGER SPACE FOR FACTORS = 7720270 MAXIMUM FRONTAL SIZE (ESTIMATED) = 14076 NUMBER OF NODES IN THE TREE = 2088 Convergence error after scaling for ONE-NORM (option 7/8) = 0.23D+00 Maximum effective relaxed size of S = 85672330 Average effective relaxed size of S = 62508033 REDISTRIB: TOTAL DATA LOCAL/SENT = 10021430 617681311 GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.9780 ** Memory relaxation parameter ( ICNTL(14) ) : 55 ** Rank of processor needing largest memory in facto : 50 ** Space in MBYTES used by this processor for facto : 894 ** Avg. Space in MBYTES per working proc during facto : 710 ELAPSED TIME FOR FACTORIZATION = 319.1792 Maximum effective space used in S (KEEP8(67) = 53098421 Average effective space used in S (KEEP8(67) = 40980992 ** EFF Min: Rank of processor needing largest memory : 44 ** EFF Min: Space in MBYTES used by this processor : 634 ** EFF Min: Avg. Space in MBYTES per working proc : 538 GLOBAL STATISTICS RINFOG(2) OPERATIONS IN NODE ASSEMBLY = 6.839D+09 ------(3) OPERATIONS IN NODE ELIMINATION= 1.281D+13 INFOG (9) REAL SPACE FOR FACTORS = 1767798047 INFOG(10) INTEGER SPACE FOR FACTORS = 7489300 INFOG(11) MAXIMUM FRONT SIZE = 14076 INFOG(29) NUMBER OF ENTRIES IN FACTORS = 1569664271 INFOG(12) NB OF OFF DIAGONAL PIVOTS = 0 INFOG(13) NUMBER OF DELAYED PIVOTS = 0 INFOG(14) NUMBER OF MEMORY COMPRESS = 133 KEEP8(108) Extra copies IP stacking = 0 Entering DMUMPS driver with JOB, N, NZ = 3 215883 67725225 ****** SOLVE & CHECK STEP ******** STATISTICS PRIOR SOLVE PHASE ........... NUMBER OF RIGHT-HAND-SIDES = 1 BLOCKING FACTOR FOR MULTIPLE RHS = 1 ICNTL (9) = 1 --- (10) = 0 --- (11) = 0 --- (20) = 0 --- (21) = 1 --- (30) = 0 ** Rank of processor needing largest memory in solve : 50 ** Space in MBYTES used by this processor for solve : 708 ** Avg. Space in MBYTES per working proc during solve : 522 Entering DMUMPS driver with JOB, N, NZ = 2 215883 67725225 ****** FACTORIZATION STEP ******** GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... NUMBER OF WORKING PROCESSES = 64 OUT-OF-CORE OPTION (ICNTL(22)) = 0 REAL SPACE FOR FACTORS = 1788547958 INTEGER SPACE FOR FACTORS = 7720270 MAXIMUM FRONTAL SIZE (ESTIMATED) = 14076 NUMBER OF NODES IN THE TREE = 2088 Convergence error after scaling for ONE-NORM (option 7/8) = 0.23D+00 Maximum effective relaxed size of S = 85672330 Average effective relaxed size of S = 62508033 REDISTRIB: TOTAL DATA LOCAL/SENT = 10021430 617681311 GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.8740 ** Memory relaxation parameter ( ICNTL(14) ) : 55 ** Rank of processor needing largest memory in facto : 50 ** Space in MBYTES used by this processor for facto : 894 ** Avg. Space in MBYTES per working proc during facto : 710 ELAPSED TIME FOR FACTORIZATION = 311.6785 Maximum effective space used in S (KEEP8(67) = 52134149 Average effective space used in S (KEEP8(67) = 41613054 ** EFF Min: Rank of processor needing largest memory : 44 ** EFF Min: Space in MBYTES used by this processor : 626 ** EFF Min: Avg. Space in MBYTES per working proc : 543 GLOBAL STATISTICS RINFOG(2) OPERATIONS IN NODE ASSEMBLY = 6.839D+09 ------(3) OPERATIONS IN NODE ELIMINATION= 1.281D+13 INFOG (9) REAL SPACE FOR FACTORS = 1767798047 INFOG(10) INTEGER SPACE FOR FACTORS = 7511831 INFOG(11) MAXIMUM FRONT SIZE = 14076 INFOG(29) NUMBER OF ENTRIES IN FACTORS = 1569664271 INFOG(12) NB OF OFF DIAGONAL PIVOTS = 0 INFOG(13) NUMBER OF DELAYED PIVOTS = 0 INFOG(14) NUMBER OF MEMORY COMPRESS = 144 KEEP8(108) Extra copies IP stacking = 0 Entering DMUMPS driver with JOB, N, NZ = 3 215883 67725225 ****** SOLVE & CHECK STEP ******** STATISTICS PRIOR SOLVE PHASE ........... NUMBER OF RIGHT-HAND-SIDES = 1 BLOCKING FACTOR FOR MULTIPLE RHS = 1 ICNTL (9) = 1 --- (10) = 0 --- (11) = 0 --- (20) = 0 --- (21) = 1 --- (30) = 0 ** Rank of processor needing largest memory in solve : 50 ** Space in MBYTES used by this processor for solve : 708 ** Avg. Space in MBYTES per working proc during solve : 522 Entering DMUMPS driver with JOB, N, NZ = -2 215883 67725225