[petsc-users] Very poor speed up performance
Yongjun Chen
yjxd.chen at gmail.com
Wed Dec 22 12:11:12 CST 2010
On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> On Wed, 22 Dec 2010, Yongjun Chen wrote:
>
> > On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> >
> > > Thanks a lot, Satish. It is much clear now. But for the choice of the
> two,
> > the program dmidecode does not show this information. Do you know any way
> to
> > get it?
>
> why do you expect dmidecode to show that?
>
> You'll have to look for the CPU/chipset hardware documentation - and
> look at the details - and sometimes they mention these details..
>
> Satish
>
Thanks, Satish. Yes, I need to check it.
Just now I re-configured PETSC with the option --with-device=ch3:nemsis. The
results are almost the same as --with-device=ch3:sock. As can be seen in the
attachment.
I hope the matrix partitioning - reordering algorithm would have some
positive effects.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/3dc041ce/attachment-0001.htm>
-------------- next part --------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Wed Dec 22 17:41:47 2010
KSP Object:
type: bicg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-07, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:
type: jacobi
linear system matrix = precond matrix:
Matrix Object:
type=mpisbaij, rows=1177754, cols=1177754
total: nonzeros=49908476, allocated nonzeros=49908476
block size is 1
norm(b-Ax)=1.32502e-06
Norm of error 1.32502e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 333.681 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 17:47:21 2010
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Wed Dec 22 18:47:21 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
Max Max/Min Avg Total
Time (sec): 3.558e+02 1.00000 3.558e+02
Objects: 3.000e+01 1.00000 3.000e+01
Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11
Flops/sec: 2.190e+08 1.09702 2.140e+08 1.712e+09
MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04
MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10
MPI Reductions: 4.477e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.5581e+02 100.0% 6.0914e+11 100.0% 4.135e+04 100.0% 2.430e+05 100.0% 4.461e+03 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1474 1.0 1.5404e+02 1.6 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 1876
MatMultTranspose 1473 1.0 1.4721e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 37 47 50 50 0 37 47 50 50 0 1962
MatAssemblyBegin 1 1.0 6.0289e-0316.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 5.2618e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 2.0790e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecView 1 1.0 1.0855e+0112.8 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecDot 2946 1.0 9.9344e+0120.5 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03 12 1 0 0 66 12 1 0 0 66 70
VecNorm 1475 1.0 5.6723e+00 2.9 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 613
VecCopy 4 1.0 5.5063e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 8843 1.0 2.1978e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 4420 1.0 8.6108e+00 1.3 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1209
VecAYPX 2944 1.0 6.0635e+00 1.4 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1144
VecAssemblyBegin 6 1.0 4.8455e-0217.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 6 1.0 3.5286e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 2948 1.0 8.7080e+00 1.3 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 399
VecScatterBegin 2947 1.0 1.8601e+00 2.6 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00 0 0100100 0 0 0100100 0 0
VecScatterEnd 2947 1.0 9.0296e+0116.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
KSPSetup 1 1.0 9.8538e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 3.2263e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 91100100100 99 91100100100 99 1887
PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2948 1.0 8.7381e+00 1.3 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 397
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 3 3 84944064 0
Vec 18 18 15741712 0
Vec Scatter 2 2 1736 0
Index Set 4 4 409008 0
Krylov Solver 1 1 832 0
Preconditioner 1 1 872 0
Viewer 1 1 544 0
========================================================================================================================
Average time to get PetscTime(): 4.98295e-06
Average time for MPI_Barrier(): 9.76086e-05
Average time for zero size MPI_Send(): 2.81334e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 18:24:43 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3nemsis
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
------------------------------------------
-------------- next part --------------
Process 0 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 5 of total 12 on wmss04Process 11 of total 12 on wmss04
Process 2 of total 12 on wmss04
Process 7 of total 12 on wmss04
Process 3 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 1 of total 12 on wmss04
Process 9 of total 12 on wmss04
Process 10 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Wed Dec 22 17:55:12 2010
KSP Object:
type: bicg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-07, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:
type: jacobi
linear system matrix = precond matrix:
Matrix Object:
type=mpisbaij, rows=1177754, cols=1177754
total: nonzeros=49908476, allocated nonzeros=49908476
block size is 1
norm(b-Ax)=1.28414e-06
Norm of error 1.28414e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 241.392 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 17:59:13 2010
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Wed Dec 22 18:59:13 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
Max Max/Min Avg Total
Time (sec): 2.594e+02 1.00000 2.594e+02
Objects: 3.000e+01 1.00000 3.000e+01
Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11
Flops/sec: 2.004e+08 1.11689 1.956e+08 2.348e+09
MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04
MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10
MPI Reductions: 4.477e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.5935e+02 100.0% 6.0890e+11 100.0% 6.498e+04 100.0% 2.345e+05 100.0% 4.461e+03 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1474 1.0 1.1203e+02 1.5 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 39 47 50 50 0 39 47 50 50 0 2579
MatMultTranspose 1473 1.0 9.9342e+01 1.3 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 36 47 50 50 0 36 47 50 50 0 2906
MatAssemblyBegin 1 1.0 3.7930e-03 8.9 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 5.1536e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 2.2507e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecView 1 1.0 1.2744e+0166.4 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecDot 2946 1.0 5.4256e+0115.3 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03 6 1 0 0 66 6 1 0 0 66 128
VecNorm 1475 1.0 7.3386e+00 5.2 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 473
VecCopy 4 1.0 6.2873e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 8843 1.0 2.5036e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 4420 1.0 7.4288e+00 1.8 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1401
VecAYPX 2944 1.0 5.0487e+00 2.5 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1374
VecAssemblyBegin 6 1.0 3.4969e-0211.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 6 1.0 5.5075e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 2948 1.0 7.2035e+00 1.7 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 482
VecScatterBegin 2947 1.0 2.5759e+00 2.7 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00 1 0100100 0 1 0100100 0 0
VecScatterEnd 2947 1.0 5.1555e+0111.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0
KSPSetup 1 1.0 8.2631e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.2851e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 88100100100 99 88100100100 99 2664
PCSetUp 1 1.0 7.1526e-06 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2948 1.0 7.2339e+00 1.7 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 480
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 3 3 56593044 0
Vec 18 18 10534536 0
Vec Scatter 2 2 1736 0
Index Set 4 4 305424 0
Krylov Solver 1 1 832 0
Preconditioner 1 1 872 0
Viewer 1 1 544 0
========================================================================================================================
Average time to get PetscTime(): 7.82013e-06
Average time for MPI_Barrier(): 9.52244e-05
Average time for zero size MPI_Send(): 2.15769e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 18:24:43 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3nemsis
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
------------------------------------------
-------------- next part --------------
Process 0 of total 16 on wmss04
Process 8 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process Process 3 of total 16 on wmss04
Process 15 of total 16 on wmss04
7 of total 16 on wmss04Process 1 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 13 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
Process 11 of total 16 on wmss04
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Wed Dec 22 17:50:47 2010
KSP Object:
type: bicg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-07, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:
type: jacobi
linear system matrix = precond matrix:
Matrix Object:
type=mpisbaij, rows=1177754, cols=1177754
total: nonzeros=49908476, allocated nonzeros=49908476
block size is 1
norm(b-Ax)=1.23596e-06
Norm of error 1.23596e-06, Iterations 1481
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 227.888 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 17:54:35 2010
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Wed Dec 22 18:54:35 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
Max Max/Min Avg Total
Time (sec): 2.442e+02 1.00001 2.442e+02
Objects: 3.000e+01 1.00000 3.000e+01
Flops: 3.922e+10 1.13060 3.822e+10 6.116e+11
Flops/sec: 1.606e+08 1.13060 1.565e+08 2.504e+09
MPI Messages: 1.187e+04 3.99916 7.051e+03 1.128e+05
MPI Message Lengths: 1.929e+09 7.80850 1.819e+05 2.052e+10
MPI Reductions: 4.501e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.4422e+02 100.0% 6.1159e+11 100.0% 1.128e+05 100.0% 1.819e+05 100.0% 4.485e+03 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1482 1.0 1.1549e+02 2.0 1.86e+10 1.1 5.6e+04 1.8e+05 0.0e+00 36 47 50 50 0 36 47 50 50 0 2513
MatMultTranspose 1481 1.0 9.3652e+01 1.4 1.86e+10 1.1 5.6e+04 1.8e+05 0.0e+00 32 47 50 50 0 32 47 50 50 0 3097
MatAssemblyBegin 1 1.0 4.6110e-03 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 5.1871e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 5.1212e-04 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecView 1 1.0 1.2031e+01123.8 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecDot 2962 1.0 7.2313e+0122.5 4.36e+08 1.0 0.0e+00 0.0e+00 3.0e+03 13 1 0 0 66 13 1 0 0 66 96
VecNorm 1483 1.0 5.2508e+00 4.6 2.18e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 665
VecCopy 4 1.0 3.2623e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 8891 1.0 2.5386e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 4444 1.0 6.6341e+00 1.6 6.54e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1578
VecAYPX 2960 1.0 4.2830e+00 1.7 4.36e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1628
VecAssemblyBegin 6 1.0 4.0186e-0213.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 6 1.0 6.0081e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 2964 1.0 6.2569e+00 1.6 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 558
VecScatterBegin 2963 1.0 2.9219e+00 4.0 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00 1 0100100 0 1 0100100 0 0
VecScatterEnd 2963 1.0 5.0568e+01 7.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0
KSPSetup 1 1.0 5.8019e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.1573e+02 1.0 3.92e+10 1.1 1.1e+05 1.8e+05 4.4e+03 88100100100 99 88100100100 99 2834
PCSetUp 1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2964 1.0 6.2830e+00 1.6 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 556
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 3 3 42424600 0
Vec 18 18 7924896 0
Vec Scatter 2 2 1736 0
Index Set 4 4 247632 0
Krylov Solver 1 1 832 0
Preconditioner 1 1 872 0
Viewer 1 1 544 0
========================================================================================================================
Average time to get PetscTime(): 1.38998e-05
Average time for MPI_Barrier(): 0.00011363
Average time for zero size MPI_Send(): 2.03103e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 18:24:43 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3nemsis
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
------------------------------------------
More information about the petsc-users
mailing list