************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./ex5 on a arch-cuda-double named hohhot with 1 processor, by hongwang Mon Aug 6 12:55:31 2012 Using Petsc Development HG revision: d01946145980533f72b6500bd243b1dd3666686c HG Date: Mon Jul 30 17:03:27 2012 -0500 Max Max/Min Avg Total Time (sec): 4.842e+00 1.00000 4.842e+00 Objects: 5.700e+01 1.00000 5.700e+01 Flops: 6.339e+03 1.00000 6.339e+03 6.339e+03 Flops/sec: 1.309e+03 1.00000 1.309e+03 1.309e+03 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 5.600e+01 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.8423e+00 100.0% 6.3390e+03 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 5.500e+01 98.2% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %f - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 4 1.0 1.3185e-04 1.0 1.24e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1 VecMDot 8 1.0 4.7112e-04 1.0 3.72e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 1 VecNorm 20 1.0 6.7258e-04 1.0 6.20e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 1 VecScale 12 1.0 2.0361e-04 1.0 1.92e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 1 VecCopy 16 1.0 2.1482e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 23 1.0 4.2319e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 4 1.0 5.5075e-05 1.0 1.28e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2 VecWAXPY 4 1.0 7.8201e-05 1.0 6.40e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1 VecMAXPY 12 1.0 2.4462e-04 1.0 7.68e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 12 0 0 0 0 12 0 0 0 3 VecScatterBegin 9 1.0 1.1802e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 9 1.0 3.5095e-04 1.0 2.79e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 1 VecReduceComm 5 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 12 1.0 6.1584e-04 1.0 5.64e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 1 VecCUSPCopyTo 18 1.0 2.4247e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecCUSPCopyFrom 36 1.0 4.8995e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 12 1.0 2.4247e-04 1.0 1.34e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 6 MatSolve 12 1.0 3.0303e-04 1.0 1.34e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 4 MatLUFactorNum 4 1.0 1.4544e-05 1.0 2.24e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 15 MatILUFactorSym 1 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 2 0 0 0 0 2 0 MatAssemblyBegin 5 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 5 1.0 7.2312e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 2.9087e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 4 0 0 0 0 4 0 MatCUSPCopyTo 5 1.0 6.9809e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 4 1.0 1.6069e-04 1.0 2.24e+02 1.0 0.0e+00 0.0e+00 3.0e+00 0 4 0 0 5 0 4 0 0 5 1 PCApply 12 1.0 3.1185e-04 1.0 1.34e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 4 KSPGMRESOrthog 8 1.0 6.6924e-04 1.0 8.84e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 14 0 0 0 0 14 0 0 0 1 KSPSetUp 4 1.0 4.5154e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 18 0 0 0 0 18 0 KSPSolve 4 1.0 6.8514e-03 1.0 4.30e+03 1.0 0.0e+00 0.0e+00 1.3e+01 0 68 0 0 23 0 68 0 0 24 1 SNESSolve 1 1.0 1.2513e-02 1.0 6.34e+03 1.0 0.0e+00 0.0e+00 2.1e+01 0100 0 0 38 0100 0 0 38 1 SNESFunctionEval 5 1.0 1.0629e-03 1.0 8.80e+02 1.0 0.0e+00 0.0e+00 2.0e+00 0 14 0 0 4 0 14 0 0 4 1 SNESJacobianEval 4 1.0 1.6258e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 4 0 0 0 0 4 0 SNESLineSearch 4 1.0 1.0707e-03 1.0 1.84e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 29 0 0 0 0 29 0 0 0 2 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Container 2 2 1200 0 Viewer 1 0 0 0 Bipartite Graph 6 6 4440 0 Index Set 13 13 10992 0 IS L to G Mapping 3 3 2244 0 Vector 19 19 29952 0 Vector Scatter 4 4 2704 0 Matrix 2 2 8744 0 Distributed Mesh 3 3 15728 0 Preconditioner 1 1 1008 0 Krylov Solver 1 1 18584 0 SNES 1 1 1416 0 SNESLineSearch 1 1 888 0 ======================================================================================================================== Average time to get PetscTime(): 0 #PETSc Option Table entries: -dm_mat_type aijcusp -dm_vec_type cusp -log_summary ex5_log #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure run at: Sat Aug 4 15:10:44 2012 Configure options: --doCleanup=1 --with-gnu-compilers=1 --with-vendor-compilers=0 --CFLAGS=-march=x86-64 --CXXFLAGS=-march=x86-64 --with-dynamic-loading --with-python=1 --with-debugging=0 --with-log=1 --download-mpich=1 --with-hypre=0 --with-64-bit-indices=yes --with-x11=1 --with-x11-include=/usr/include/X11 --download-f-blas-lapack=1 --with-cuda=1 --with-cusp=1 --with-thrust=1 --download-txpetscgpu=1 --with-precision=double --with-cudac="nvcc -m64" --download-txpetscgpu=1 --with-clanguage=c --with-cuda-arch=sm_20 ----------------------------------------- Libraries compiled on Sat Aug 4 15:10:44 2012 on hohhot Machine characteristics: Linux-2.6.32-5-amd64-x86_64-with-debian-6.0.5 Using PETSc directory: /usr/src/petsc/petsc-dev Using PETSc arch: arch-cuda-double ----------------------------------------- Using C compiler: /usr/src/petsc/petsc-dev/arch-cuda-double/bin/mpicc -march=x86-64 -fPIC -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /usr/src/petsc/petsc-dev/arch-cuda-double/bin/mpif90 -fPIC -Wall -Wno-unused-variable -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/usr/src/petsc/petsc-dev/arch-cuda-double/include -I/usr/src/petsc/petsc-dev/include -I/usr/src/petsc/petsc-dev/include -I/usr/src/petsc/petsc-dev/arch-cuda-double/include -I/usr/local/cuda/include -I/usr/src/petsc/petsc-dev/arch-cuda-double/include/txpetscgpu/include ----------------------------------------- Using C linker: /usr/src/petsc/petsc-dev/arch-cuda-double/bin/mpicc Using Fortran linker: /usr/src/petsc/petsc-dev/arch-cuda-double/bin/mpif90 Using libraries: -Wl,-rpath,/usr/src/petsc/petsc-dev/arch-cuda-double/lib -L/usr/src/petsc/petsc-dev/arch-cuda-double/lib -lpetsc -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lcublas -lcudart -lcusparse -lpthread -Wl,-rpath,/usr/src/petsc/petsc-dev/arch-cuda-double/lib -L/usr/src/petsc/petsc-dev/arch-cuda-double/lib -lflapack -lfblas -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.5 -L/usr/lib/gcc/x86_64-linux-gnu/4.4.5 -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lgcc_s -ldl -----------------------------------------