[petsc-users] Very poor speed up performance
Yongjun Chen
yjxd.chen at gmail.com
Wed Dec 22 09:55:23 CST 2010
Satish,
I have reconfigured the PETSC with –download-mpich=1 and
–with-device=ch3:sock. The results show that the speed up can now remain
increasing when computing cores increase from 1 to 16. However, the maximum
speed up is still only around 6.0 with 16 cores. The new log files can be
found in the attachment.
(1)
I checked the configuration of the first server again. This server is a
shared-memory computer, with
Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz
Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the
memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.
It seems that each core can get 2.7GB/s memory bandwidth which can fulfill
the basic requirement for sparse iterative solvers.
Is this correct? Does the shared-memory type of computer have no benefit for
PETSC when the memory bandwidth is limited?
(2)
Beside, we would like to continue our work by employing a matrix
partitioning / reordering algorithm, such as Metis or ParMetis, to improve
the speed up performance of the program. (The current program works without
any matrix decomposition.)
Matt, as you said in
http://lists.mcs.anl.gov/pipermail/petsc-users/2007-January/001017.html
,“Reordering
a matrix can result in fewer iterations for an iterative solver“.
Do you think the matrix partitioning/reordering will work for this program?
Or any further suggestions?
Any comments are very welcome! Thank you!
On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> On Mon, 20 Dec 2010, Yongjun Chen wrote:
>
> > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly
> and
> > see what I can get.
>
> hydra is just the process manager.
>
> Also --download-mpich uses a slightly older version - with
> device=ch3:sock for portability and valgrind reasons [development]
>
> You might want to install latest mpich manually with the defaut
> device=ch3:nemsis and recheck..
>
> satish
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/2f3dc444/attachment-0001.htm>
-------------- next part --------------
Process 0 of total 4 on wmss04
Process 2 of total 4 on wmss04
Process 1 of total 4 on wmss04
Process 3 of total 4 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Wed Dec 22 11:41:09 2010
KSP Object:
type: bicg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-07, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:
type: jacobi
linear system matrix = precond matrix:
Matrix Object:
type=mpisbaij, rows=1177754, cols=1177754
total: nonzeros=49908476, allocated nonzeros=49908476
block size is 1
norm(b-Ax)=1.28342e-06
Norm of error 1.28342e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 420.527 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:48:09 2010
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Wed Dec 22 12:48:09 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
Max Max/Min Avg Total
Time (sec): 4.531e+02 1.00000 4.531e+02
Objects: 3.000e+01 1.00000 3.000e+01
Flops: 1.558e+11 1.06872 1.523e+11 6.091e+11
Flops/sec: 3.438e+08 1.06872 3.361e+08 1.344e+09
MPI Messages: 5.906e+03 2.00017 4.430e+03 1.772e+04
MPI Message Lengths: 1.727e+09 2.74432 2.658e+05 4.710e+09
MPI Reductions: 4.477e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.5314e+02 100.0% 6.0914e+11 100.0% 1.772e+04 100.0% 2.658e+05 100.0% 4.461e+03 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1474 1.0 1.7876e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50 0 39 47 50 50 0 1617
MatMultTranspose 1473 1.0 1.7886e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50 0 39 47 50 50 0 1615
MatAssemblyBegin 1 1.0 3.2670e-0312.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 6.1171e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 1.6379e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecView 1 1.0 1.0934e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecDot 2946 1.0 1.9010e+01 2.2 1.73e+09 1.0 0.0e+00 0.0e+00 2.9e+03 3 1 0 0 66 3 1 0 0 66 365
VecNorm 1475 1.0 1.0313e+01 2.8 8.69e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 337
VecCopy 4 1.0 5.2447e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 8843 1.0 2.8803e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 4420 1.0 1.3866e+01 1.5 2.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 751
VecAYPX 2944 1.0 1.0440e+01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 664
VecAssemblyBegin 6 1.0 1.0071e-0161.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 6 1.0 2.4080e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 2948 1.0 1.6040e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 216
VecScatterBegin 2947 1.0 1.7367e+00 2.2 0.00e+00 0.0 1.8e+04 2.7e+05 0.0e+00 0 0100100 0 0 0100100 0 0
VecScatterEnd 2947 1.0 3.0331e+01 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
KSPSetup 1 1.0 1.3974e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 4.0934e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 4.4e+03 90100100100 99 90100100100 99 1488
PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2948 1.0 1.6080e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 216
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 3 3 169902696 0
Vec 18 18 31282096 0
Vec Scatter 2 2 1736 0
Index Set 4 4 638616 0
Krylov Solver 1 1 832 0
Preconditioner 1 1 872 0
Viewer 1 1 544 0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-06
Average time for MPI_Barrier(): 5.97954e-05
Average time for zero size MPI_Send(): 2.07424e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
------------------------------------------
-------------- next part --------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Wed Dec 22 11:12:03 2010
KSP Object:
type: bicg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-07, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:
type: jacobi
linear system matrix = precond matrix:
Matrix Object:
type=mpisbaij, rows=1177754, cols=1177754
total: nonzeros=49908476, allocated nonzeros=49908476
block size is 1
norm(b-Ax)=1.32502e-06
Norm of error 1.32502e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 291.989 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:16:55 2010
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Wed Dec 22 12:16:55 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
Max Max/Min Avg Total
Time (sec): 3.113e+02 1.00000 3.113e+02
Objects: 3.000e+01 1.00000 3.000e+01
Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11
Flops/sec: 2.503e+08 1.09702 2.446e+08 1.957e+09
MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04
MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10
MPI Reductions: 4.477e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.1128e+02 100.0% 6.0914e+11 100.0% 4.135e+04 100.0% 2.430e+05 100.0% 4.461e+03 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1474 1.0 1.2879e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 36 47 50 50 0 36 47 50 50 0 2244
MatMultTranspose 1473 1.0 1.2240e+02 1.3 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 37 47 50 50 0 37 47 50 50 0 2360
MatAssemblyBegin 1 1.0 3.1061e-03 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 5.0727e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 2.2912e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecView 1 1.0 1.1926e+0113.1 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecDot 2946 1.0 6.5343e+0113.5 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03 9 1 0 0 66 9 1 0 0 66 106
VecNorm 1475 1.0 6.9889e+00 3.6 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 497
VecCopy 4 1.0 5.1496e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 8843 1.0 2.2587e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 4420 1.0 8.7103e+00 1.5 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1195
VecAYPX 2944 1.0 5.7803e+00 1.4 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1200
VecAssemblyBegin 6 1.0 3.9916e-0214.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 6 1.0 3.6001e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 2948 1.0 8.6749e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 400
VecScatterBegin 2947 1.0 1.9621e+00 2.7 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00 0 0100100 0 0 0100100 0 0
VecScatterEnd 2947 1.0 5.9072e+0110.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0
KSPSetup 1 1.0 8.9231e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.7991e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 90100100100 99 90100100100 99 2175
PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2948 1.0 8.7041e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 399
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 3 3 84944064 0
Vec 18 18 15741712 0
Vec Scatter 2 2 1736 0
Index Set 4 4 409008 0
Krylov Solver 1 1 832 0
Preconditioner 1 1 872 0
Viewer 1 1 544 0
========================================================================================================================
Average time to get PetscTime(): 4.3869e-06
Average time for MPI_Barrier(): 7.25746e-05
Average time for zero size MPI_Send(): 2.06232e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
------------------------------------------
-------------- next part --------------
Process 0 of total 12 on wmss04
Process 2 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 11 of total 12 on wmss04
Process 1Process 3 of total 12 on wmss04
of total 12 on wmss04
Process 5 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Process 9 of total 12 on wmss04
Process 10 of total 12 on wmss04
Process 7 of total 12 on wmss04
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Wed Dec 22 12:13:43 2010
KSP Object:
type: bicg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-07, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:
type: jacobi
linear system matrix = precond matrix:
Matrix Object:
type=mpisbaij, rows=1177754, cols=1177754
total: nonzeros=49908476, allocated nonzeros=49908476
block size is 1
norm(b-Ax)=1.28414e-06
Norm of error 1.28414e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 253.909 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 12:17:57 2010
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Wed Dec 22 13:17:57 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
Max Max/Min Avg Total
Time (sec): 2.721e+02 1.00000 2.721e+02
Objects: 3.000e+01 1.00000 3.000e+01
Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11
Flops/sec: 1.910e+08 1.11689 1.865e+08 2.238e+09
MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04
MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10
MPI Reductions: 4.477e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.7212e+02 100.0% 6.0890e+11 100.0% 6.498e+04 100.0% 2.345e+05 100.0% 4.461e+03 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1474 1.0 1.2467e+02 1.6 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 37 47 50 50 0 37 47 50 50 0 2318
MatMultTranspose 1473 1.0 1.0645e+02 1.3 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 2712
MatAssemblyBegin 1 1.0 4.0723e-0274.7 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 5.3137e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 2.8801e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecView 1 1.0 1.2262e+0190.2 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecDot 2946 1.0 6.1395e+0111.5 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03 9 1 0 0 66 9 1 0 0 66 113
VecNorm 1475 1.0 5.8101e+00 3.3 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 598
VecCopy 4 1.0 5.6744e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 8843 1.0 2.1137e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 4420 1.0 6.6266e+00 1.4 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1571
VecAYPX 2944 1.0 5.2210e+00 2.3 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1328
VecAssemblyBegin 6 1.0 5.0129e-0218.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 6 1.0 4.7922e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 2948 1.0 7.0911e+00 1.6 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 490
VecScatterBegin 2947 1.0 2.5096e+00 3.1 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00 1 0100100 0 1 0100100 0 0
VecScatterEnd 2947 1.0 4.4540e+01 6.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0
KSPSetup 1 1.0 7.9119e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.4149e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 89100100100 99 89100100100 99 2521
PCSetUp 1 1.0 6.1989e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2948 1.0 7.1207e+00 1.6 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 488
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 3 3 56593044 0
Vec 18 18 10534536 0
Vec Scatter 2 2 1736 0
Index Set 4 4 305424 0
Krylov Solver 1 1 832 0
Preconditioner 1 1 872 0
Viewer 1 1 544 0
========================================================================================================================
Average time to get PetscTime(): 6.00815e-06
Average time for MPI_Barrier(): 0.000122833
Average time for zero size MPI_Send(): 2.81533e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
------------------------------------------
-------------- next part --------------
Process 3 of total 16 on wmss04
Process 7 of total 16 on wmss04
Process 1 of total 16 on wmss04
Process 15 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 13 of total 16 on wmss04
Process 11 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 0 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 8 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Wed Dec 22 11:23:54 2010
KSP Object:
type: bicg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-07, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:
type: jacobi
linear system matrix = precond matrix:
Matrix Object:
type=mpisbaij, rows=1177754, cols=1177754
total: nonzeros=49908476, allocated nonzeros=49908476
block size is 1
norm(b-Ax)=1.194e-06
Norm of error 1.194e-06, Iterations 1495
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 240.208 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:27:54 2010
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Wed Dec 22 12:27:54 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
Max Max/Min Avg Total
Time (sec): 2.565e+02 1.00001 2.565e+02
Objects: 3.000e+01 1.00000 3.000e+01
Flops: 3.959e+10 1.13060 3.859e+10 6.174e+11
Flops/sec: 1.543e+08 1.13060 1.504e+08 2.407e+09
MPI Messages: 1.198e+04 3.99917 7.118e+03 1.139e+05
MPI Message Lengths: 1.948e+09 7.80981 1.819e+05 2.071e+10
MPI Reductions: 4.543e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.5651e+02 100.0% 6.1737e+11 100.0% 1.139e+05 100.0% 1.819e+05 100.0% 4.527e+03 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1496 1.0 1.1625e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 38 47 50 50 0 38 47 50 50 0 2520
MatMultTranspose 1495 1.0 9.7790e+01 1.2 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 2994
MatAssemblyBegin 1 1.0 6.3910e-0314.3 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 5.2797e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 3.0708e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecView 1 1.0 1.1235e+01111.3 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecDot 2990 1.0 5.7054e+0114.6 4.40e+08 1.0 0.0e+00 0.0e+00 3.0e+03 9 1 0 0 66 9 1 0 0 66 123
VecNorm 1497 1.0 5.8130e+00 3.5 2.20e+08 1.0 0.0e+00 0.0e+00 1.5e+03 2 1 0 0 33 2 1 0 0 33 607
VecCopy 4 1.0 3.3658e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 8975 1.0 2.5879e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 4486 1.0 7.5991e+00 1.6 6.60e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1391
VecAYPX 2988 1.0 4.6226e+00 1.6 4.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1523
VecAssemblyBegin 6 1.0 3.9858e-0213.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 6 1.0 6.6996e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 2992 1.0 7.0992e+00 1.5 2.20e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 496
VecScatterBegin 2991 1.0 3.3736e+00 3.7 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00 1 0100100 0 1 0100100 0 0
VecScatterEnd 2991 1.0 3.3633e+01 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0
KSPSetup 1 1.0 5.6469e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.2884e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 4.5e+03 89100100100 99 89100100100 99 2697
PCSetUp 1 1.0 5.0068e-06 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2992 1.0 7.1263e+00 1.5 2.20e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 494
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 3 3 42424600 0
Vec 18 18 7924896 0
Vec Scatter 2 2 1736 0
Index Set 4 4 247632 0
Krylov Solver 1 1 832 0
Preconditioner 1 1 872 0
Viewer 1 1 544 0
========================================================================================================================
Average time to get PetscTime(): 8.91685e-06
Average time for MPI_Barrier(): 0.000128984
Average time for zero size MPI_Send(): 1.8239e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
------------------------------------------
More information about the petsc-users
mailing list