[petsc-users] Very poor speed up performance
Matthew Knepley
knepley at gmail.com
Mon Dec 20 13:21:17 CST 2010
On Mon, Dec 20, 2010 at 10:38 AM, Yongjun Chen <yjxd.chen at gmail.com> wrote:
> Hi Matt,
>
> Thanks for your reply. Just now I have carried out a series of tests with
> k=2, 4, 8, 12 and 16 cores on the first server again with the -log_summary
> option. From 8 cores to 12 cores, a small speed up has been found this time,
> but from 12 cores to 16 cores, the computation time increase!
> Attached please find these 5 log files. Thank you very much!
>
Its very clear from these, but Barry was right in his reply. These are
memory bandwidth limited
computations, so if you don't get any more bandwidth you will not speed up.
This is rarely mentioned
in sales pitches for multicore computers. LAMMPS is not limited by bandwidth
for most computations.
Matt
> mpiexec -n *k* ./AMG_Solver_MPI -pc_type jacobi -ksp_type bicg
> -log_summary
> Here, I use ksp bicg instead of gmres, because the two ksp gives almost the
> same speed up performance, as I have tried many times.
> ----------------------
> (1) k=2
> ----------------------
> Process 1 of total 2 on wmss04
> Process 0 of total 2 on wmss04
> The dimension of Matrix A is n = 1177754
> Begin Assembly:
> Begin Assembly:
> End Assembly.
> End Assembly.
> =========================================================
> Begin the solving:
> =========================================================
> The current time is: Mon Dec 20 17:42:23 2010
>
> KSP Object:
> type: bicg
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-07, absolute=1e-50, divergence=10000
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object:
> type: jacobi
> linear system matrix = precond matrix:
> Matrix Object:
> type=mpisbaij, rows=1177754, cols=1177754
> total: nonzeros=49908476, allocated nonzeros=49908476
> block size is 1
>
> norm(b-Ax)=1.25862e-06
> Norm of error 1.25862e-06, Iterations 1475
> =========================================================
> The solver has finished successfully!
> =========================================================
> The solving time is 762.874 seconds.
> The time accuracy is 1e-06 second.
> The current time is Mon Dec 20 17:55:06 2010
>
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny
> Mon Dec 20 18:55:06 2010
> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>
> Max Max/Min Avg Total
> Time (sec): 8.160e+02 1.00000 8.160e+02
> Objects: 3.000e+01 1.00000 3.000e+01
> Flops: 3.120e+11 1.04720 3.050e+11 6.100e+11
> Flops/sec: 3.824e+08 1.04720 3.737e+08 7.475e+08
> MPI Messages: 2.958e+03 1.00068 2.958e+03 5.915e+03
> MPI Message Lengths: 9.598e+08 1.00034 3.245e+05 1.919e+09
> MPI Reductions: 4.483e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 8.1603e+02 100.0% 6.0997e+11 100.0% 5.915e+03
> 100.0% 3.245e+05 100.0% 4.467e+03 99.6%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec)
> Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05
> 0.0e+00 41 47 50 50 0 41 47 50 50 0 846
> MatMultTranspose 1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05
> 0.0e+00 42 47 50 50 0 42 47 50 50 0 846
> MatAssemblyBegin 1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05
> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
> MatView 1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecView 1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecDot 2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00
> 3.0e+03 2 1 0 0 66 2 1 0 0 66 340
> VecNorm 1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00
> 1.5e+03 1 1 0 0 33 1 1 0 0 33 287
> VecCopy 4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAXPY 4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 2 0 0 0 2 2 0 0 0 566
> VecAYPX 2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 510
> VecAssemblyBegin 6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecPointwiseMult 2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 194
> VecScatterBegin 2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05
> 0.0e+00 0 0100100 0 0 0100100 0 0
> VecScatterEnd 2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> KSPSetup 1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05
> 4.4e+03 92100100100 99 92100100100 99 811
> PCSetUp 1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 193
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 3 3 339744648 0
> Vec 18 18 62239872 0
> Vec Scatter 2 2 1736 0
> Index Set 4 4 974736 0
> Krylov Solver 1 1 832 0
> Preconditioner 1 1 872 0
> Viewer 1 1 544 0
>
> ========================================================================================================================
> Average time to get PetscTime(): 1.21593e-06
> Average time for MPI_Barrier(): 1.44005e-05
> Average time for zero size MPI_Send(): 1.94311e-05
> #PETSc Option Table entries:
> -ksp_type bicg
> -log_summary
> -pc_type jacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Nov 23 15:54:45 2010
> Configure options: --known-level1-dcache-size=65536
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
> --known-mpi-shared=1
> -----------------------------------------
> Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04
> Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6
> 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
> Using PETSc arch: linux-gnu-c-opt
> -----------------------------------------
> Using C compiler:
> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall
> -Wwrite-strings -Wno-strict-aliasing -O
> Using Fortran compiler:
> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall
> -Wno-unused-variable -O
> -----------------------------------------
> Using include paths:
> -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include
> -I/sun42/cheny/petsc-3.1-p5-optimized/include
> -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include
> ------------------------------------------
> Using C linker:
> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall
> -Wwrite-strings -Wno-strict-aliasing -O
> Using Fortran linker:
> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall
> -Wno-unused-variable -O
> Using libraries:
> -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc
> -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx
> -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord
> -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt
> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
> -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2
> -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib
> -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t
> -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib
> -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90
> -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich
> -lpthread -lrt -lgcc_s -ldl
> ------------------------------------------
>
>
> ----------------------
> (2) k=4
> ----------------------
> Process 0 of total 4 on wmss04
> Process 2 of total 4 on wmss04
> Process 3 of total 4 on wmss04
> Process 1 of total 4 on wmss04
> The dimension of Matrix A is n = 1177754
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> =========================================================
> Begin the solving:
> =========================================================
> The current time is: Mon Dec 20 17:33:24 2010
>
> KSP Object:
> type: bicg
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-07, absolute=1e-50, divergence=10000
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object:
> type: jacobi
> linear system matrix = precond matrix:
> Matrix Object:
> type=mpisbaij, rows=1177754, cols=1177754
> total: nonzeros=49908476, allocated nonzeros=49908476
> block size is 1
>
> norm(b-Ax)=1.28342e-06
> Norm of error 1.28342e-06, Iterations 1473
> =========================================================
> The solver has finished successfully!
> =========================================================
> The solving time is 450.583 seconds.
> The time accuracy is 1e-06 second.
> The current time is Mon Dec 20 17:40:55 2010
>
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny
> Mon Dec 20 18:40:55 2010
> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>
> Max Max/Min Avg Total
> Time (sec): 4.807e+02 1.00000 4.807e+02
> Objects: 3.000e+01 1.00000 3.000e+01
> Flops: 1.558e+11 1.06872 1.523e+11 6.091e+11
> Flops/sec: 3.241e+08 1.06872 3.168e+08 1.267e+09
> MPI Messages: 5.906e+03 2.00017 4.430e+03 1.772e+04
> MPI Message Lengths: 1.727e+09 2.74432 2.658e+05 4.710e+09
> MPI Reductions: 4.477e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 4.8066e+02 100.0% 6.0914e+11 100.0% 1.772e+04
> 100.0% 2.658e+05 100.0% 4.461e+03 99.6%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec)
> Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05
> 0.0e+00 39 47 50 50 0 39 47 50 50 0 1494
> MatMultTranspose 1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05
> 0.0e+00 40 47 50 50 0 40 47 50 50 0 1498
> MatAssemblyBegin 1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04
> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
> MatView 1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecView 1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecDot 2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00
> 2.9e+03 3 1 0 0 66 3 1 0 0 66 274
> VecNorm 1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00
> 1.5e+03 1 1 0 0 33 1 1 0 0 33 310
> VecCopy 4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAXPY 4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 3 2 0 0 0 3 2 0 0 0 732
> VecAYPX 2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 610
> VecAssemblyBegin 6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecPointwiseMult 2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 3 1 0 0 0 3 1 0 0 0 202
> VecScatterBegin 2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05
> 0.0e+00 0 0100100 0 0 0100100 0 0
> VecScatterEnd 2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 6 0 0 0 0 6 0 0 0 0 0
> KSPSetup 1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05
> 4.4e+03 91100100100 99 91100100100 99 1386
> PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 3 1 0 0 0 3 1 0 0 0 201
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 3 3 169902696 0
> Vec 18 18 31282096 0
> Vec Scatter 2 2 1736 0
> Index Set 4 4 638616 0
> Krylov Solver 1 1 832 0
> Preconditioner 1 1 872 0
> Viewer 1 1 544 0
>
> ========================================================================================================================
> Average time to get PetscTime(): 1.5974e-06
> Average time for MPI_Barrier(): 3.48091e-05
> Average time for zero size MPI_Send(): 1.8537e-05
> #PETSc Option Table entries:
> -ksp_type bicg
> -log_summary
> -pc_type jacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Nov 23 15:54:45 2010
> Configure options: --known-level1-dcache-size=65536
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
> --known-mpi-shared=1
> -----------------------------------------
>
>
>
> ----------------------
> (3) k=8
> ----------------------
> Process 0 of total 8 on wmss04
> Process 4 of total 8 on wmss04
> Process 2 of total 8 on wmss04
> Process 6 of total 8 on wmss04
> Process 3 of total 8 on wmss04
> Process 7 of total 8 on wmss04
> Process 1 of total 8 on wmss04
> Process 5 of total 8 on wmss04
> The dimension of Matrix A is n = 1177754
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> =========================================================
> Begin the solving:
> =========================================================
> The current time is: Mon Dec 20 18:14:59 2010
>
> KSP Object:
> type: bicg
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-07, absolute=1e-50, divergence=10000
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object:
> type: jacobi
> linear system matrix = precond matrix:
> Matrix Object:
> type=mpisbaij, rows=1177754, cols=1177754
> total: nonzeros=49908476, allocated nonzeros=49908476
> block size is 1
>
> norm(b-Ax)=1.32502e-06
> Norm of error 1.32502e-06, Iterations 1473
> =========================================================
> The solver has finished successfully!
> =========================================================
> The solving time is 311.937 seconds.
> The time accuracy is 1e-06 second.
> The current time is Mon Dec 20 18:20:11 2010
>
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny
> Mon Dec 20 19:20:11 2010
> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>
> Max Max/Min Avg Total
> Time (sec): 3.330e+02 1.00000 3.330e+02
> Objects: 3.000e+01 1.00000 3.000e+01
> Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11
> Flops/sec: 2.340e+08 1.09702 2.286e+08 1.829e+09
> MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04
> MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10
> MPI Reductions: 4.477e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 3.3302e+02 100.0% 6.0914e+11 100.0% 4.135e+04
> 100.0% 2.430e+05 100.0% 4.461e+03 99.6%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec)
> Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05
> 0.0e+00 38 47 50 50 0 38 47 50 50 0 2031
> MatMultTranspose 1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05
> 0.0e+00 38 47 50 50 0 38 47 50 50 0 2120
> MatAssemblyBegin 1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04
> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
> MatView 1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecView 1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05
> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> VecDot 2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00
> 2.9e+03 6 1 0 0 66 6 1 0 0 66 194
> VecNorm 1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00
> 1.5e+03 1 1 0 0 33 1 1 0 0 33 428
> VecCopy 4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAXPY 4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 2 0 0 0 2 2 0 0 0 1127
> VecAYPX 2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 1015
> VecAssemblyBegin 6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecPointwiseMult 2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 3 1 0 0 0 3 1 0 0 0 359
> VecScatterBegin 2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05
> 0.0e+00 1 0100100 0 1 0100100 0 0
> VecScatterEnd 2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 9 0 0 0 0 9 0 0 0 0 0
> KSPSetup 1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05
> 4.4e+03 90100100100 99 90100100100 99 2024
> PCSetUp 1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 3 1 0 0 0 3 1 0 0 0 358
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 3 3 84944064 0
> Vec 18 18 15741712 0
> Vec Scatter 2 2 1736 0
> Index Set 4 4 409008 0
> Krylov Solver 1 1 832 0
> Preconditioner 1 1 872 0
> Viewer 1 1 544 0
>
> ========================================================================================================================
> Average time to get PetscTime(): 3.38554e-06
> Average time for MPI_Barrier(): 7.40051e-05
> Average time for zero size MPI_Send(): 1.88947e-05
> #PETSc Option Table entries:
> -ksp_type bicg
> -log_summary
> -pc_type jacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Nov 23 15:54:45 2010
> Configure options: --known-level1-dcache-size=65536
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
> --known-mpi-shared=1
> -----------------------------------------
>
>
>
> ----------------------
> (4) k=12
> ----------------------
> Process 1 of total 12 on wmss04
> Process 5 of total 12 on wmss04
> Process 2 of total 12 on wmss04
> Process 9 of total 12 on wmss04
> Process 6 of total 12 on wmss04
> Process 7 of total 12 on wmss04
> Process 10 of total 12 on wmss04
> Process 3 of total 12 on wmss04
> Process 11 of total 12 on wmss04
> Process 4 of total 12 on wmss04
> Process 8 of total 12 on wmss04
> Process 0 of total 12 on wmss04
> The dimension of Matrix A is n = 1177754
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.End Assembly.
> End Assembly.
> End Assembly.
>
> End Assembly.
> End Assembly.
> =========================================================
> Begin the solving:
> =========================================================
> The current time is: Mon Dec 20 17:56:36 2010
>
> KSP Object:
> type: bicg
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-07, absolute=1e-50, divergence=10000
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object:
> type: jacobi
> linear system matrix = precond matrix:
> Matrix Object:
> type=mpisbaij, rows=1177754, cols=1177754
> total: nonzeros=49908476, allocated nonzeros=49908476
> block size is 1
>
> norm(b-Ax)=1.28414e-06
> Norm of error 1.28414e-06, Iterations 1473
> =========================================================
> The solver has finished successfully!
> =========================================================
> The solving time is 291.503 seconds.
> The time accuracy is 1e-06 second.
> The current time is Mon Dec 20 18:01:28 2010
>
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny
> Mon Dec 20 19:01:28 2010
> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>
> Max Max/Min Avg Total
> Time (sec): 3.089e+02 1.00012 3.089e+02
> Objects: 3.000e+01 1.00000 3.000e+01
> Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11
> Flops/sec: 1.683e+08 1.11689 1.643e+08 1.971e+09
> MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04
> MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10
> MPI Reductions: 4.477e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 3.0887e+02 100.0% 6.0890e+11 100.0% 6.498e+04
> 100.0% 2.345e+05 100.0% 4.461e+03 99.6%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec)
> Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05
> 0.0e+00 35 47 50 50 0 35 47 50 50 0 2054
> MatMultTranspose 1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05
> 0.0e+00 34 47 50 50 0 34 47 50 50 0 2175
> MatAssemblyBegin 1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04
> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
> MatView 1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecView 1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05
> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> VecDot 2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00
> 2.9e+03 13 1 0 0 66 13 1 0 0 66 60
> VecNorm 1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00
> 1.5e+03 2 1 0 0 33 2 1 0 0 33 322
> VecCopy 4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAXPY 4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 2 0 0 0 2 2 0 0 0 964
> VecAYPX 2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 1041
> VecAssemblyBegin 6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecPointwiseMult 2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 395
> VecScatterBegin 2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05
> 0.0e+00 1 0100100 0 1 0100100 0 0
> VecScatterEnd 2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
> KSPSetup 1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05
> 4.4e+03 91100100100 99 91100100100 99 2173
> PCSetUp 1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 393
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 3 3 56593044 0
> Vec 18 18 10534536 0
> Vec Scatter 2 2 1736 0
> Index Set 4 4 305424 0
> Krylov Solver 1 1 832 0
> Preconditioner 1 1 872 0
> Viewer 1 1 544 0
>
> ========================================================================================================================
> Average time to get PetscTime(): 6.48499e-06
> Average time for MPI_Barrier(): 0.000102377
> Average time for zero size MPI_Send(): 2.15967e-05
> #PETSc Option Table entries:
> -ksp_type bicg
> -log_summary
> -pc_type jacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Nov 23 15:54:45 2010
> Configure options: --known-level1-dcache-size=65536
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
> --known-mpi-shared=1
> -----------------------------------------
>
>
> ----------------------
> (5) k=16
> ----------------------
> Process 0 of total 16 on wmss04
> Process 8 of total 16 on wmss04
> Process 4 of total 16 on wmss04
> Process 12 of total 16 on wmss04
> Process 2 of total 16 on wmss04
> Process 6 of total 16 on wmss04
> Process 5 of total 16 on wmss04
> Process 11 of total 16 on wmss04
> Process 14 of total 16 on wmss04
> Process 7 of total 16 on wmss04
> Process Process 15 of total 16 on wmss04
> 3Process 13 of total 16 on wmss04
> Process 10 of total 16 on wmss04
> Process 9 of total 16 on wmss04
> Process 1 of total 16 on wmss04
> The dimension of Matrix A is n = 1177754
> of total 16 on wmss04
>
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
>
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
>
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
>
> Begin Assembly:
> Begin Assembly:
> End Assembly.
> End Assembly.End Assembly.
> End Assembly.End Assembly.End Assembly.End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.End Assembly.
>
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.End Assembly.
>
>
>
> =========================================================
> Begin the solving:
> =========================================================
> The current time is: Mon Dec 20 18:02:28 2010
>
> KSP Object:
> type: bicg
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-07, absolute=1e-50, divergence=10000
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object:
> type: jacobi
> linear system matrix = precond matrix:
> Matrix Object:
> type=mpisbaij, rows=1177754, cols=1177754
> total: nonzeros=49908476, allocated nonzeros=49908476
> block size is 1
>
> norm(b-Ax)=1.15892e-06
> Norm of error 1.15892e-06, Iterations 1497
> =========================================================
> The solver has finished successfully!
> =========================================================
> The solving time is 337.91 seconds.
> The time accuracy is 1e-06 second.
> The current time is Mon Dec 20 18:08:06 2010
>
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny
> Mon Dec 20 19:08:06 2010
> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>
> Max Max/Min Avg Total
> Time (sec): 3.534e+02 1.00001 3.534e+02
> Objects: 3.000e+01 1.00000 3.000e+01
> Flops: 3.964e+10 1.13060 3.864e+10 6.182e+11
> Flops/sec: 1.122e+08 1.13060 1.093e+08 1.749e+09
> MPI Messages: 1.200e+04 3.99917 7.127e+03 1.140e+05
> MPI Message Lengths: 1.950e+09 7.80999 1.819e+05 2.074e+10
> MPI Reductions: 4.549e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 3.5342e+02 100.0% 6.1820e+11 100.0% 1.140e+05
> 100.0% 1.819e+05 100.0% 4.533e+03 99.6%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec)
> Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05
> 0.0e+00 40 47 50 50 0 40 47 50 50 0 1555
> MatMultTranspose 1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05
> 0.0e+00 35 47 50 50 0 35 47 50 50 0 2069
> MatAssemblyBegin 1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04
> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
> MatView 1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecView 1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05
> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> VecDot 2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00
> 3.0e+03 10 1 0 0 66 10 1 0 0 66 104
> VecNorm 1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00
> 1.5e+03 2 1 0 0 33 2 1 0 0 33 263
> VecCopy 4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAXPY 4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 2 0 0 0 2 2 0 0 0 931
> VecAYPX 2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 1 0 0 0 1 1 0 0 0 962
> VecAssemblyBegin 6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecPointwiseMult 2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 360
> VecScatterBegin 2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05
> 0.0e+00 1 0100100 0 1 0100100 0 0
> VecScatterEnd 2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 22 0 0 0 0 22 0 0 0 0 0
> KSPSetup 1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05
> 4.5e+03 92100100100 99 92100100100 99 1893
> PCSetUp 1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 359
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 3 3 42424600 0
> Vec 18 18 7924896 0
> Vec Scatter 2 2 1736 0
> Index Set 4 4 247632 0
> Krylov Solver 1 1 832 0
> Preconditioner 1 1 872 0
> Viewer 1 1 544 0
>
> ========================================================================================================================
> Average time to get PetscTime(): 6.10352e-06
> Average time for MPI_Barrier(): 0.000129986
> Average time for zero size MPI_Send(): 2.08169e-05
> #PETSc Option Table entries:
> -ksp_type bicg
> -log_summary
> -pc_type jacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Nov 23 15:54:45 2010
> Configure options: --known-level1-dcache-size=65536
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
> --known-mpi-shared=1
> -----------------------------------------
>
>
>
>
> On Mon, Dec 20, 2010 at 6:06 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen <yjxd.chen at gmail.com>wrote:
>>
>>>
>>> Hi everyone,
>>>
>>>
>>> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A
>>> and right hand vector b are read from files. The dimension of A is
>>> 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been
>>> read correctly.
>>>
>>> I compiled the program with optimized version (--with-debugging=0),
>>> tested the speed up performance on two servers, and I have found that the
>>> performance is very poor.
>>>
>>> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total
>>> 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48
>>> cores.
>>>
>>> On each of them, with the increasing of computing cores k from 1 to 8
>>> (mpiexec –n k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up
>>> will increase from 1 to 6, but when the computing cores k increase from 9 to
>>> 16(for the first server) or 48 (for the second server), the speed up
>>> decrease firstly and then remains a constant value 5.0 (for the first
>>> server) or 4.5(for the second server).
>>>
>>
>> We cannot say anything at all without -log_summary data for your runs.
>>
>> Matt
>>
>>
>>> Actually, the program LAMMPS speed up excellently on these two servers.
>>>
>>> Any comments are very appreciated! Thanks!
>>>
>>>
>>>
>>>
>>> --------------------------------------------------------------------------------------------------------------------------
>>>
>>> PS: the related codes are as following,
>>>
>>>
>>> //firstly read A and b from files
>>>
>>> ...
>>>
>>> //then
>>>
>>>
>>>
>>> ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
>>> CHKERRQ(ierr);
>>>
>>> ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
>>> CHKERRQ(ierr);
>>>
>>> ierr = VecAssemblyBegin(b); CHKERRQ(ierr);
>>>
>>> ierr = VecAssemblyEnd(b); CHKERRQ(ierr);
>>>
>>>
>>>
>>> ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE);
>>> CHKERRQ(ierr);
>>>
>>> ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr);
>>>
>>> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>>>
>>>
>>>
>>> ierr =
>>> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
>>>
>>> ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
>>>
>>> ierr =
>>> KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);
>>>
>>> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>>
>>>
>>>
>>> ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>>>
>>>
>>>
>>> ierr =
>>> KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
>>>
>>>
>>>
>>> ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr);
>>>
>>>
>>>
>>> ierr = VecAssemblyBegin(x);CHKERRQ(ierr);
>>>
>>> ierr = VecAssemblyEnd(x);CHKERRQ(ierr);
>>>
>>> ...
>>>
>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
>
> --
> Dr.Yongjun Chen
> Room 2507, Building M
> Institute of Materials Science and Technology
> Technical University of Hamburg-Harburg
> Eißendorfer Straße 42, 21073 Hamburg, Germany.
> Tel: +49 (0)40-42878-4386
> Fax: +49 (0)40-42878-4070
> E-mail: yjxd.chen at gmail.com
>
>
--
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101220/0be40bd4/attachment-0001.htm>
More information about the petsc-users
mailing list