0 KSP Residual norm 6.206301471225e+01 1 KSP Residual norm 3.079683879705e+00 2 KSP Residual norm 4.369504976117e+00 3 KSP Residual norm 3.354601185869e+00 4 KSP Residual norm 1.571233567871e+00 5 KSP Residual norm 1.385702273523e+00 6 KSP Residual norm 2.159430652736e+00 7 KSP Residual norm 1.318561214028e+00 8 KSP Residual norm 8.967209421327e-01 9 KSP Residual norm 1.096012931726e+00 10 KSP Residual norm 9.319890019804e-01 11 KSP Residual norm 5.348918167150e-01 12 KSP Residual norm 4.698020551509e-01 13 KSP Residual norm 4.595139985429e-01 14 KSP Residual norm 1.891631663345e-01 15 KSP Residual norm 1.329191007178e-01 16 KSP Residual norm 1.185276686175e-01 17 KSP Residual norm 4.405279230755e-02 18 KSP Residual norm 2.893364784031e-02 19 KSP Residual norm 2.512725512743e-02 20 KSP Residual norm 1.259710848003e-02 21 KSP Residual norm 9.791548456122e-03 22 KSP Residual norm 9.376281816946e-03 23 KSP Residual norm 4.968862265906e-03 24 KSP Residual norm 3.089995769959e-03 25 KSP Residual norm 2.533951313414e-03 26 KSP Residual norm 1.073223302580e-03 27 KSP Residual norm 7.461462796262e-04 28 KSP Residual norm 6.594813834045e-04 29 KSP Residual norm 3.857955952756e-04 KSP Object: type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-10, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: bjacobi block Jacobi: number of blocks = 4 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (sub_) type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (sub_) type: lu LU: out-of-place factorization tolerance for zero pivot 1e-12 matrix ordering: nd factor fill ratio given 5, needed 6.03304 Factored matrix follows: Matrix Object: type: seqaij rows=4096, cols=4096 package used to perform factorization: petsc total: nonzeros=121626, allocated nonzeros=121626 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Matrix Object: type: seqaijcuda rows=4096, cols=4096 total: nonzeros=20160, allocated nonzeros=20480 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Matrix Object: type: mpiaijcuda rows=16384, cols=16384 total: nonzeros=81408, allocated nonzeros=163840 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Norm of error 0.00407995 iterations 29 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./ex2 on a linux-gnu named c0306 with 4 processors, by liluo Tue Jan 18 12:39:22 2011 Using Petsc Development HG revision: 179fe3d1768f57c49fa44a3a47095b573a99716c HG Date: Wed Dec 08 11:34:50 2010 -0600 Max Max/Min Avg Total Time (sec): 7.410e+00 1.00001 7.410e+00 Objects: 2.600e+01 1.00000 2.600e+01 Flops: 1.358e+07 1.00057 1.357e+07 5.430e+07 Flops/sec: 1.832e+06 1.00057 1.832e+06 7.327e+06 MPI Messages: 6.800e+01 1.88889 5.400e+01 2.160e+02 MPI Message Lengths: 6.249e+04 1.99949 8.681e+02 1.875e+05 MPI Reductions: 1.160e+02 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 7.4079e+00 100.0% 5.4296e+07 100.0% 2.040e+02 94.4% 8.538e+02 98.3% 9.300e+01 80.2% 1: Assembly: 2.3695e-03 0.0% 0.0000e+00 0.0% 1.200e+01 5.6% 1.433e+01 1.7% 9.000e+00 7.8% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 30 1.0 3.1516e-01 1.0 1.22e+06 1.0 1.8e+02 1.0e+03 0.0e+00 4 9 83 98 0 4 9 88100 0 15 MatSolve 30 1.0 6.6759e-02 1.0 7.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 53 0 0 0 1 53 0 0 0 430 MatLUFactorSym 1 1.0 3.6950e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 1 0 0 0 0 1 0 MatLUFactorNum 1 1.0 6.4511e-03 1.0 3.48e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 2159 MatGetRowIJ 1 1.0 7.9155e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 2 0 0 0 0 2 0 MatView 3 3.0 9.1314e-0441.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 1 0 0 0 0 1 0 VecDot 58 1.0 3.2988e-01 1.2 4.75e+05 1.0 0.0e+00 0.0e+00 5.8e+01 4 3 0 0 50 4 3 0 0 62 6 VecNorm 31 1.0 2.2695e-01 1.2 5.08e+05 1.0 0.0e+00 0.0e+00 3.1e+01 3 4 0 0 27 3 4 0 0 33 9 VecCopy 122 1.0 2.2866e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 VecSet 32 1.0 6.4911e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 59 1.0 9.8484e-02 1.7 4.83e+05 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 20 VecAYPX 28 1.0 4.1339e-0262.3 2.29e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 22 VecScatterBegin 30 1.0 5.7032e-02 1.0 0.00e+00 0.0 1.8e+02 1.0e+03 0.0e+00 1 0 83 98 0 1 0 88100 0 0 VecScatterEnd 30 1.0 1.0705e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecCUDACopyTo 119 1.0 2.3531e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 VecCUDACopyFrom 147 1.0 2.7879e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 KSPSetup 2 1.0 1.4067e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 1.2590e+00 1.0 1.35e+07 1.0 1.7e+02 1.0e+03 9.1e+01 17100 81 95 78 17100 85 97 98 43 PCSetUp 2 1.0 1.2377e-02 1.0 3.48e+06 1.0 0.0e+00 0.0e+00 3.0e+00 0 26 0 0 3 0 26 0 0 3 1125 PCSetUpOnBlocks 1 1.0 1.2298e-02 1.0 3.48e+06 1.0 0.0e+00 0.0e+00 3.0e+00 0 26 0 0 3 0 26 0 0 3 1133 PCApply 30 1.0 3.4887e-01 1.0 7.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00 5 53 0 0 0 5 53 0 0 0 82 --- Event Stage 1: Assembly MatAssemblyBegin 1 1.0 8.4162e-05 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 2 2 0 0 0 22 0 MatAssemblyEnd 1 1.0 5.0497e-04 1.0 0.00e+00 0.0 1.2e+01 2.6e+02 7.0e+00 0 0 6 2 6 21 0100100 78 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 4 4 2092356 0 Vec 8 9 210904 0 Vec Scatter 0 1 1012 0 Index Set 3 3 34928 0 Krylov Solver 2 2 2032 0 Preconditioner 2 2 1696 0 Viewer 2 2 1360 0 --- Event Stage 1: Assembly Vec 2 1 1496 0 Vec Scatter 1 0 0 0 Index Set 2 2 1432 0 ======================================================================================================================== Average time to get PetscTime(): 2.14577e-07 Average time for MPI_Barrier(): 5.19753e-06 Average time for zero size MPI_Send(): 7.15256e-07 #PETSc Option Table entries: -ksp_atol 1.e-10 -ksp_monitor -ksp_rtol 1.e-5 -ksp_type cg -ksp_view -log_summary -m 128 -mat_type mpiaijcuda -n 128 -options_left -pc_type bjacobi -sub_ksp_type preonly -sub_pc_type lu -vec_type mpicuda #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Mon Jan 17 21:18:44 2011 Configure options: --download-f-blas-lapack=1 --with-mpi-dir=/bwfs/software/mpich2-1.2.1p1 --with-shared-libraries=0 --with-debugging=no --with-cuda-dir=/bwfs/home/liluo/cuda3.2_64 --with-thrust-dir=/bwfs/home/liluo/cuda3.2_64/include/thrust --with-cusp-dir=/bwfs/home/liluo/cuda3.2_64/include/cusp-library ----------------------------------------- Libraries compiled on Mon Jan 17 21:18:44 2011 on console Machine characteristics: Linux-2.6.18-128.el5-x86_64-with-redhat-5.3-Tikanga Using PETSc directory: /bwfs/home/liluo/petsc-dev Using PETSc arch: linux-gnu-c-debug ----------------------------------------- Using C compiler: /bwfs/software/mpich2-1.2.1p1/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /bwfs/software/mpich2-1.2.1p1/bin/mpif77 -Wall -Wno-unused-variable -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/bwfs/home/liluo/petsc-dev/linux-gnu-c-debug/include -I/bwfs/home/liluo/petsc-dev/include -I/bwfs/home/liluo/petsc-dev/include -I/bwfs/home/liluo/petsc-dev/linux-gnu-c-debug/include -I/bwfs/home/liluo/cuda3.2_64/include -I/bwfs/home/liluo/cuda3.2_64/include/cusp-library/ -I/bwfs/home/liluo/cuda3.2_64/include/thrust/ -I/bwfs/software/mpich2-1.2.1p1/include ----------------------------------------- Using C linker: /bwfs/software/mpich2-1.2.1p1/bin/mpicc Using Fortran linker: /bwfs/software/mpich2-1.2.1p1/bin/mpif77 Using libraries: -Wl,-rpath,/bwfs/home/liluo/petsc-dev/linux-gnu-c-debug/lib -L/bwfs/home/liluo/petsc-dev/linux-gnu-c-debug/lib -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetscsys -lX11 -Wl,-rpath,/bwfs/home/liluo/cuda3.2_64/lib64 -L/bwfs/home/liluo/cuda3.2_64/lib64 -lcufft -lcublas -lcudart -Wl,-rpath,/bwfs/home/liluo/petsc-dev/linux-gnu-c-debug/lib -L/bwfs/home/liluo/petsc-dev/linux-gnu-c-debug/lib -lflapack -lfblas -L/bwfs/software/mpich2-1.2.1p1/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -lg2c -lm -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl ----------------------------------------- #PETSc Option Table entries: -ksp_atol 1.e-10 -ksp_monitor -ksp_rtol 1.e-5 -ksp_type cg -ksp_view -log_summary -m 128 -mat_type mpiaijcuda -n 128 -options_left -pc_type bjacobi -sub_ksp_type preonly -sub_pc_type lu -vec_type mpicuda #End of PETSc Option Table entries There are no unused options.