[petsc-users] Why the convergence is much slower when I use two nodes
Ji Zhang
gotofd at gmail.com
Tue Jun 13 07:38:17 CDT 2017
mpirun -n 1 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2
-b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view >
mpi_1.txt
mpirun -n 2 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2
-b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view >
mpi_2.txt
mpirun -n 4 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2
-b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view >
mpi_3.txt
mpirun -n 6 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2
-b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view >
mpi_4.txt
mpirun -n 2 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb
1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_5.txt
mpirun -n 4 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb
1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_6.txt
mpirun -n 6 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb
1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_7.txt
Dear Barry,
The following tests are runing in our cluster using one, two or three
nodes. Each node have 64GB memory and 24 cups (Intel(R) Xeon(R) CPU E5-2680
v3 @ 2.50GHz). Basic information of each node are listed below.
$ lstopo
Machine (64GB)
NUMANode L#0 (P#0 32GB)
Socket L#0 + L3 L#0 (30MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0
(P#0)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1
(P#1)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2
(P#2)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3
(P#3)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4
(P#4)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5
(P#5)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6
(P#6)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7
(P#7)
L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8
(P#8)
L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9
(P#9)
L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU
L#10 (P#10)
L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU
L#11 (P#11)
HostBridge L#0
PCIBridge
PCI 1000:0097
Block L#0 "sda"
PCIBridge
PCI 8086:1523
Net L#1 "eth0"
PCI 8086:1523
Net L#2 "eth1"
PCIBridge
PCIBridge
PCI 1a03:2000
PCI 8086:8d02
NUMANode L#1 (P#1 32GB)
Socket L#1 + L3 L#1 (30MB)
L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU
L#12 (P#12)
L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU
L#13 (P#13)
L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU
L#14 (P#14)
L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU
L#15 (P#15)
L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU
L#16 (P#16)
L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU
L#17 (P#17)
L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU
L#18 (P#18)
L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU
L#19 (P#19)
L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 + PU
L#20 (P#20)
L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU
L#21 (P#21)
L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU
L#22 (P#22)
L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23 + PU
L#23 (P#23)
HostBridge L#5
PCIBridge
PCI 15b3:1003
Net L#3 "ib0"
OpenFabrics L#4 "mlx4_0"
I have tested seven different cases. Each case solving three different
linear equation systems A*x1=b1, A*x2=b2, A*x3=b3. The matrix A is a
mpidense matrix and b1, b2, b3 are different vectors.
I'm using GMRES method without precondition method . I have set -ksp_mat_it
1000
process nodes eq1_residual_norms eq1_duration
eq2_residual_norms eq2_duration eq3_residual_norms eq3_duration
mpi_1.txt: 1 1 9.884635e-04 88.631310s 4.144572e-04
88.855811s 4.864481e-03 88.673738s
mpi_2.txt: 2 2 6.719300e-01 84.212435s 6.782443e-01
85.153371s 7.223828e-01 85.246724s
mpi_3.txt: 4 4 5.813354e-01 52.616490s 5.397962e-01
52.413213s 9.503432e-01 52.495871s
mpi_4.txt: 6 6 4.621066e-01 42.929705s 4.661823e-01
43.367914s 1.047436e+00 43.108877s
mpi_5.txt: 2 1 6.719300e-01 141.490945s 6.782443e-01
142.746243s 7.223828e-01 142.042608s
mpi_6.txt: 3 1 5.813354e-01 165.061162s 5.397962e-01
196.539286s 9.503432e-01 180.240947s
mpi_7.txt: 6 1 4.621066e-01 213.683270s 4.661823e-01
208.180939s 1.047436e+00 194.251886s
I found that all residual norms are on the order of 1 except the first
case, which one I only use one process at one node.
See the attach files for more details, please.
此致
敬礼
张骥(博士研究生)
北京计算科学研究中心
北京市海淀区西北旺东路10号院东区9号楼 (100193)
Best,
Regards,
Zhang Ji, PhD student
Beijing Computational Science Research Center
Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian
District, Beijing 100193, China
On Tue, Jun 13, 2017 at 9:34 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> You need to provide more information. What is the output of -ksp_view?
> and -log_view? for both cases
>
> > On Jun 12, 2017, at 7:11 PM, Ji Zhang <gotofd at gmail.com> wrote:
> >
> > Dear all,
> >
> > I'm a PETSc user. I'm using GMRES method to solve some linear equations.
> I'm using boundary element method, so the matrix type is dense (or
> mpidense). I'm using MPICH2, I found that the convergence is fast if I only
> use one computer node; and much more slower if I use two or more nodes. I'm
> interested in why this happen, and how can I improve the convergence
> performance when I use multi-nodes.
> >
> > Thanks a lot.
> >
> > 此致
> > 敬礼
> > 张骥(博士研究生)
> > 北京计算科学研究中心
> > 北京市海淀区西北旺东路10号院东区9号楼 (100193)
> >
> > Best,
> > Regards,
> > Zhang Ji, PhD student
> > Beijing Computational Science Research Center
> > Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian
> District, Beijing 100193, China
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170613/07ed146c/attachment-0001.html>
-------------- next part --------------
Case information:
pipe length: 2.000000, pipe radius: 1.000000
delta length of pipe is 0.050000, epsilon of pipe is 2.000000
threshold of seriers is 30
b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
create matrix method: pf_stokesletsInPipe
solve method: gmres, precondition method: none
output file headle: force_pipe
MPI size: 6
Stokeslets in pipe prepare, contain 7376 nodes
create matrix use 3.737578s:
_00001/00001_b=0.000100: calculate boundary condation use: 1.798243s
KSP Object: 6 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 6 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u1: solve matrix equation use: 213.683270s, with residual norm 4.621066e-01
KSP Object: 6 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 6 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u2: solve matrix equation use: 208.180939s, with residual norm 4.661823e-01
KSP Object: 6 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 6 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u3: solve matrix equation use: 194.251886s, with residual norm 1.047436e+00
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
force_pipe.py on a linux-mpich-opblas named cn10 with 6 processors, by zhangji Tue Jun 13 18:11:46 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 6.236e+02 1.00013 6.235e+02
Objects: 4.130e+02 1.00000 4.130e+02
Flops: 5.073e+11 1.00081 5.070e+11 3.042e+12
Flops/sec: 8.136e+08 1.00092 8.131e+08 4.879e+09
MPI Messages: 4.200e+01 2.33333 3.000e+01 1.800e+02
MPI Message Lengths: 2.520e+02 2.33333 6.000e+00 1.080e+03
MPI Reductions: 9.541e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 6.2355e+02 100.0% 3.0421e+12 100.0% 1.800e+02 100.0% 6.000e+00 100.0% 9.540e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 3000 1.0 1.7038e+02 1.7 3.41e+08 1.0 0.0e+00 0.0e+00 3.0e+03 20 0 0 0 31 20 0 0 0 31 12
VecNorm 3102 1.0 7.8933e+01 1.2 2.29e+07 1.0 0.0e+00 0.0e+00 3.1e+03 11 0 0 0 33 11 0 0 0 33 2
VecScale 3102 1.0 3.2920e-02 3.1 1.14e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2085
VecCopy 3204 1.0 1.1629e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 123 1.0 1.1544e-0212.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 201 1.0 1.8733e-03 1.4 1.48e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4749
VecMAXPY 3102 1.0 3.3990e-01 2.0 3.63e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6406
VecAssemblyBegin 9 1.0 1.8613e-03 2.0 0.00e+00 0.0 1.8e+02 6.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0
VecAssemblyEnd 9 1.0 3.7193e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 3114 1.0 8.3257e+01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 10 0 0 0 33 10 0 0 0 33 0
VecNormalize 3102 1.0 7.8971e+01 1.2 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03 11 0 0 0 33 11 0 0 0 33 3
MatMult 3105 1.0 4.4362e+02 1.2 5.07e+11 1.0 0.0e+00 0.0e+00 3.1e+03 67100 0 0 33 67100 0 0 33 6848
MatAssemblyBegin 2 1.0 7.7588e-0211.5 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 1.7595e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 1.0 6.1056e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 3 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3102 1.0 1.1835e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 3000 1.0 1.7062e+02 1.7 6.82e+08 1.0 0.0e+00 0.0e+00 3.0e+03 20 0 0 0 31 20 0 0 0 31 24
KSPSetUp 3 1.0 3.8290e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 3 1.0 6.1546e+02 1.0 5.07e+11 1.0 0.0e+00 0.0e+00 9.2e+03 99100 0 0 96 99100 0 0 96 4938
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 0 0 0.
Index Set 51 51 39576 0.
IS L to G Mapping 15 15 78244 0.
Vector 205 205 6265288 0.
Vector Scatter 26 26 161064 0.
Matrix 9 9 653332008 0.
Preconditioner 3 3 2448 0.
Krylov Solver 3 3 55200 0.
Distributed Mesh 25 25 191044 0.
Star Forest Bipartite Graph 50 50 42176 0.
Discrete System 25 25 21600 0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 6.19888e-06
Average time for zero size MPI_Send(): 2.30471e-06
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------
Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------
Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information:
pipe length: 2.000000, pipe radius: 1.000000
delta length of pipe is 0.050000, epsilon of pipe is 2.000000
threshold of seriers is 30
b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
create matrix method: pf_stokesletsInPipe
solve method: gmres, precondition method: none
output file headle: force_pipe
MPI size: 4
Stokeslets in pipe prepare, contain 7376 nodes
create matrix use 4.977263s:
_00001/00001_b=0.000100: calculate boundary condation use: 1.769788s
KSP Object: 4 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u1: solve matrix equation use: 165.061162s, with residual norm 5.813354e-01
KSP Object: 4 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u2: solve matrix equation use: 196.539286s, with residual norm 5.397962e-01
KSP Object: 4 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u3: solve matrix equation use: 180.240947s, with residual norm 9.503432e-01
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
force_pipe.py on a linux-mpich-opblas named cn10 with 4 processors, by zhangji Tue Jun 13 18:01:22 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 5.506e+02 1.00007 5.506e+02
Objects: 4.130e+02 1.00000 4.130e+02
Flops: 7.605e+11 1.00000 7.605e+11 3.042e+12
Flops/sec: 1.381e+09 1.00007 1.381e+09 5.525e+09
MPI Messages: 3.000e+01 1.66667 2.700e+01 1.080e+02
MPI Message Lengths: 1.800e+02 1.66667 6.000e+00 6.480e+02
MPI Reductions: 9.541e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 5.5060e+02 100.0% 3.0421e+12 100.0% 1.080e+02 100.0% 6.000e+00 100.0% 9.540e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 3000 1.0 1.0788e+02 1.9 5.11e+08 1.0 0.0e+00 0.0e+00 3.0e+03 14 0 0 0 31 14 0 0 0 31 19
VecNorm 3102 1.0 5.0609e+01 1.1 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03 9 0 0 0 33 9 0 0 0 33 3
VecScale 3102 1.0 1.3757e-02 1.1 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4990
VecCopy 3204 1.0 9.1627e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 123 1.0 1.4656e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 201 1.0 2.9745e-03 1.6 2.22e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2991
VecMAXPY 3102 1.0 4.4902e-01 1.6 5.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4849
VecAssemblyBegin 9 1.0 1.2916e-01 1.6 0.00e+00 0.0 1.1e+02 6.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0
VecAssemblyEnd 9 1.0 3.3617e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 3114 1.0 5.2983e+01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 7 0 0 0 33 7 0 0 0 33 0
VecNormalize 3102 1.0 5.0635e+01 1.1 5.15e+07 1.0 0.0e+00 0.0e+00 3.1e+03 9 0 0 0 33 9 0 0 0 33 4
MatMult 3105 1.0 4.3607e+02 1.1 7.59e+11 1.0 0.0e+00 0.0e+00 3.1e+03 76100 0 0 33 76100 0 0 33 6966
MatAssemblyBegin 2 1.0 2.7158e-013390.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 1.6093e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 1.0 3.9665e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 3 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3102 1.0 9.2988e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 3000 1.0 1.0832e+02 1.9 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03 14 0 0 0 31 14 0 0 0 31 38
KSPSetUp 3 1.0 3.7909e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 3 1.0 5.4123e+02 1.0 7.60e+11 1.0 0.0e+00 0.0e+00 9.2e+03 98100 0 0 96 98100 0 0 96 5615
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 0 0 0.
Index Set 51 51 39576 0.
IS L to G Mapping 15 15 112628 0.
Vector 205 205 8343112 0.
Vector Scatter 26 26 229832 0.
Matrix 9 9 979454472 0.
Preconditioner 3 3 2448 0.
Krylov Solver 3 3 55200 0.
Distributed Mesh 25 25 225428 0.
Star Forest Bipartite Graph 50 50 42176 0.
Discrete System 25 25 21600 0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 2.6226e-06
Average time for zero size MPI_Send(): 2.26498e-06
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------
Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------
Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information:
pipe length: 2.000000, pipe radius: 1.000000
delta length of pipe is 0.050000, epsilon of pipe is 2.000000
threshold of seriers is 30
b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
create matrix method: pf_stokesletsInPipe
solve method: gmres, precondition method: none
output file headle: force_pipe
MPI size: 2
Stokeslets in pipe prepare, contain 7376 nodes
create matrix use 8.694003s:
_00001/00001_b=0.000100: calculate boundary condation use: 1.975384s
KSP Object: 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u1: solve matrix equation use: 141.490945s, with residual norm 6.719300e-01
KSP Object: 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u2: solve matrix equation use: 142.746243s, with residual norm 6.782443e-01
KSP Object: 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u3: solve matrix equation use: 142.042608s, with residual norm 7.223828e-01
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
force_pipe.py on a linux-mpich-opblas named cn10 with 2 processors, by zhangji Tue Jun 13 17:52:11 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 4.388e+02 1.00006 4.387e+02
Objects: 4.130e+02 1.00000 4.130e+02
Flops: 1.521e+12 1.00000 1.521e+12 3.042e+12
Flops/sec: 3.467e+09 1.00006 3.467e+09 6.934e+09
MPI Messages: 1.200e+01 1.00000 1.200e+01 2.400e+01
MPI Message Lengths: 1.080e+02 1.00000 9.000e+00 2.160e+02
MPI Reductions: 9.541e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.3875e+02 100.0% 3.0421e+12 100.0% 2.400e+01 100.0% 9.000e+00 100.0% 9.540e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 3000 1.0 2.6931e+01 4.4 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03 4 0 0 0 31 4 0 0 0 31 76
VecNorm 3102 1.0 1.4946e+00 2.1 6.86e+07 1.0 0.0e+00 0.0e+00 3.1e+03 0 0 0 0 33 0 0 0 0 33 92
VecScale 3102 1.0 1.9959e-02 1.4 3.43e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3439
VecCopy 3204 1.0 8.3234e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 123 1.0 2.2550e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 201 1.0 2.7293e-01 1.2 4.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 33
VecMAXPY 3102 1.0 5.8037e-01 1.7 1.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3752
VecAssemblyBegin 9 1.0 1.0383e-0215.2 0.00e+00 0.0 2.4e+01 9.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0
VecAssemblyEnd 9 1.0 2.5272e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 3114 1.0 8.7240e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 0 0 0 0 33 0 0 0 0 33 0
VecNormalize 3102 1.0 1.5182e+00 2.1 1.03e+08 1.0 0.0e+00 0.0e+00 3.1e+03 0 0 0 0 33 0 0 0 0 33 136
MatMult 3105 1.0 4.1876e+02 1.1 1.52e+12 1.0 0.0e+00 0.0e+00 3.1e+03 93100 0 0 33 93100 0 0 33 7254
MatAssemblyBegin 2 1.0 5.4870e-02676.9 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 1.6594e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 1.0 1.1323e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 3 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3102 1.0 8.3565e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 3000 1.0 2.7499e+01 4.3 2.04e+09 1.0 0.0e+00 0.0e+00 3.0e+03 4 0 0 0 31 4 0 0 0 31 149
KSPSetUp 3 1.0 3.8886e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 3 1.0 4.2584e+02 1.0 1.52e+12 1.0 0.0e+00 0.0e+00 9.2e+03 97100 0 0 96 97100 0 0 96 7137
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 0 0 0.
Index Set 51 51 39576 0.
IS L to G Mapping 15 15 215892 0.
Vector 205 205 14583280 0.
Vector Scatter 26 26 436360 0.
Matrix 9 9 1958884008 0.
Preconditioner 3 3 2448 0.
Krylov Solver 3 3 55200 0.
Distributed Mesh 25 25 328692 0.
Star Forest Bipartite Graph 50 50 42176 0.
Discrete System 25 25 21600 0.
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 1.7643e-06
Average time for zero size MPI_Send(): 3.45707e-06
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------
Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------
Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information:
pipe length: 2.000000, pipe radius: 1.000000
delta length of pipe is 0.050000, epsilon of pipe is 2.000000
threshold of seriers is 30
b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
create matrix method: pf_stokesletsInPipe
solve method: gmres, precondition method: none
output file headle: force_pipe
MPI size: 6
Stokeslets in pipe prepare, contain 7376 nodes
create matrix use 3.769214s:
_00001/00001_b=0.000100: calculate boundary condation use: 1.710873s
KSP Object: 6 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 6 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u1: solve matrix equation use: 42.929705s, with residual norm 4.621066e-01
KSP Object: 6 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 6 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u2: solve matrix equation use: 43.367914s, with residual norm 4.661823e-01
KSP Object: 6 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 6 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u3: solve matrix equation use: 43.108877s, with residual norm 1.047436e+00
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
force_pipe.py on a linux-mpich-opblas named cn5 with 6 processors, by zhangji Tue Jun 13 17:44:52 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 1.367e+02 1.00029 1.367e+02
Objects: 4.130e+02 1.00000 4.130e+02
Flops: 5.073e+11 1.00081 5.070e+11 3.042e+12
Flops/sec: 3.713e+09 1.00111 3.710e+09 2.226e+10
MPI Messages: 4.200e+01 2.33333 3.000e+01 1.800e+02
MPI Message Lengths: 2.520e+02 2.33333 6.000e+00 1.080e+03
MPI Reductions: 9.541e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.3666e+02 100.0% 3.0421e+12 100.0% 1.800e+02 100.0% 6.000e+00 100.0% 9.540e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 3000 1.0 8.5456e+00 5.3 3.41e+08 1.0 0.0e+00 0.0e+00 3.0e+03 4 0 0 0 31 4 0 0 0 31 239
VecNorm 3102 1.0 1.0225e+00 1.7 2.29e+07 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 134
VecScale 3102 1.0 9.2647e-03 1.5 1.14e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 7409
VecCopy 3204 1.0 3.4087e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 123 1.0 1.1790e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 201 1.0 1.6315e-03 1.4 1.48e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5452
VecMAXPY 3102 1.0 9.3553e-02 1.1 3.63e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 23274
VecAssemblyBegin 9 1.0 1.2650e-02 2.2 0.00e+00 0.0 1.8e+02 6.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0
VecAssemblyEnd 9 1.0 4.8575e-022122.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 3114 1.0 7.4042e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 5 0 0 0 33 5 0 0 0 33 0
VecNormalize 3102 1.0 1.0357e+00 1.7 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 199
MatMult 3105 1.0 1.2659e+02 1.1 5.07e+11 1.0 0.0e+00 0.0e+00 3.1e+03 90100 0 0 33 90100 0 0 33 23996
MatAssemblyBegin 2 1.0 2.0758e-01138.7 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 1.9758e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 1.0 1.5299e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 3 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3102 1.0 3.6064e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 3000 1.0 8.6466e+00 5.1 6.82e+08 1.0 0.0e+00 0.0e+00 3.0e+03 4 0 0 0 31 4 0 0 0 31 473
KSPSetUp 3 1.0 1.5187e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 3 1.0 1.2925e+02 1.0 5.07e+11 1.0 0.0e+00 0.0e+00 9.2e+03 95100 0 0 96 95100 0 0 96 23515
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 0 0 0.
Index Set 51 51 39576 0.
IS L to G Mapping 15 15 78244 0.
Vector 205 205 6265288 0.
Vector Scatter 26 26 161064 0.
Matrix 9 9 653332008 0.
Preconditioner 3 3 2448 0.
Krylov Solver 3 3 55200 0.
Distributed Mesh 25 25 191044 0.
Star Forest Bipartite Graph 50 50 42176 0.
Discrete System 25 25 21600 0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 0.000177193
Average time for zero size MPI_Send(): 3.89814e-05
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------
Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------
Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information:
pipe length: 2.000000, pipe radius: 1.000000
delta length of pipe is 0.050000, epsilon of pipe is 2.000000
threshold of seriers is 30
b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
create matrix method: pf_stokesletsInPipe
solve method: gmres, precondition method: none
output file headle: force_pipe
MPI size: 4
Stokeslets in pipe prepare, contain 7376 nodes
create matrix use 6.052226s:
_00001/00001_b=0.000100: calculate boundary condation use: 1.826128s
KSP Object: 4 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u1: solve matrix equation use: 52.616490s, with residual norm 5.813354e-01
KSP Object: 4 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u2: solve matrix equation use: 52.413213s, with residual norm 5.397962e-01
KSP Object: 4 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u3: solve matrix equation use: 52.495871s, with residual norm 9.503432e-01
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
force_pipe.py on a linux-mpich-opblas named cn5 with 4 processors, by zhangji Tue Jun 13 17:42:35 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 1.675e+02 1.00001 1.675e+02
Objects: 4.130e+02 1.00000 4.130e+02
Flops: 7.605e+11 1.00000 7.605e+11 3.042e+12
Flops/sec: 4.541e+09 1.00001 4.541e+09 1.816e+10
MPI Messages: 3.000e+01 1.66667 2.700e+01 1.080e+02
MPI Message Lengths: 1.800e+02 1.66667 6.000e+00 6.480e+02
MPI Reductions: 9.541e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.6749e+02 100.0% 3.0421e+12 100.0% 1.080e+02 100.0% 6.000e+00 100.0% 9.540e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 3000 1.0 3.3006e+0156.7 5.11e+08 1.0 0.0e+00 0.0e+00 3.0e+03 9 0 0 0 31 9 0 0 0 31 62
VecNorm 3102 1.0 1.4346e+00 3.8 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 96
VecScale 3102 1.0 1.0346e-02 1.2 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6634
VecCopy 3204 1.0 3.8601e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 123 1.0 1.1497e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 201 1.0 1.7583e-03 1.1 2.22e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5059
VecMAXPY 3102 1.0 1.4580e-01 1.1 5.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14934
VecAssemblyBegin 9 1.0 9.8622e-03 2.9 0.00e+00 0.0 1.1e+02 6.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0
VecAssemblyEnd 9 1.0 3.1948e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 3114 1.0 5.3085e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 3 0 0 0 33 3 0 0 0 33 0
VecNormalize 3102 1.0 1.4464e+00 3.7 5.15e+07 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 142
MatMult 3105 1.0 1.5605e+02 1.3 7.59e+11 1.0 0.0e+00 0.0e+00 3.1e+03 84100 0 0 33 84100 0 0 33 19466
MatAssemblyBegin 2 1.0 9.4833e-0148.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 9.2912e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 1.0 1.7860e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 3 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3102 1.0 4.0367e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 3000 1.0 3.3157e+0146.5 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03 9 0 0 0 31 9 0 0 0 31 123
KSPSetUp 3 1.0 8.2684e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 3 1.0 1.5735e+02 1.0 7.60e+11 1.0 0.0e+00 0.0e+00 9.2e+03 94100 0 0 96 94100 0 0 96 19315
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 0 0 0.
Index Set 51 51 39576 0.
IS L to G Mapping 15 15 112628 0.
Vector 205 205 8343112 0.
Vector Scatter 26 26 229832 0.
Matrix 9 9 979454472 0.
Preconditioner 3 3 2448 0.
Krylov Solver 3 3 55200 0.
Distributed Mesh 25 25 225428 0.
Star Forest Bipartite Graph 50 50 42176 0.
Discrete System 25 25 21600 0.
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 7.45773e-05
Average time for zero size MPI_Send(): 6.04987e-05
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------
Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------
Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information:
pipe length: 2.000000, pipe radius: 1.000000
delta length of pipe is 0.050000, epsilon of pipe is 2.000000
threshold of seriers is 30
b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
create matrix method: pf_stokesletsInPipe
solve method: gmres, precondition method: none
output file headle: force_pipe
MPI size: 2
Stokeslets in pipe prepare, contain 7376 nodes
create matrix use 30.575036s:
_00001/00001_b=0.000100: calculate boundary condation use: 3.463875s
KSP Object: 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u1: solve matrix equation use: 84.212435s, with residual norm 6.719300e-01
KSP Object: 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u2: solve matrix equation use: 85.153371s, with residual norm 6.782443e-01
KSP Object: 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpidense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u3: solve matrix equation use: 85.246724s, with residual norm 7.223828e-01
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
force_pipe.py on a linux-mpich-opblas named cn5 with 2 processors, by zhangji Tue Jun 13 17:39:46 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 2.908e+02 1.00002 2.908e+02
Objects: 4.130e+02 1.00000 4.130e+02
Flops: 1.521e+12 1.00000 1.521e+12 3.042e+12
Flops/sec: 5.231e+09 1.00002 5.231e+09 1.046e+10
MPI Messages: 1.200e+01 1.00000 1.200e+01 2.400e+01
MPI Message Lengths: 1.080e+02 1.00000 9.000e+00 2.160e+02
MPI Reductions: 9.541e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.9076e+02 100.0% 3.0421e+12 100.0% 2.400e+01 100.0% 9.000e+00 100.0% 9.540e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 3000 1.0 7.5770e+01146.1 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03 13 0 0 0 31 13 0 0 0 31 27
VecNorm 3102 1.0 2.5821e+00 4.9 6.86e+07 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 53
VecScale 3102 1.0 1.5304e-02 1.1 3.43e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4485
VecCopy 3204 1.0 6.6598e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 123 1.0 1.7567e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 201 1.0 1.9052e-02 2.3 4.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 467
VecMAXPY 3102 1.0 3.7953e-01 1.4 1.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5737
VecAssemblyBegin 9 1.0 7.8998e-03 3.9 0.00e+00 0.0 2.4e+01 9.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0
VecAssemblyEnd 9 1.0 2.5034e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 3114 1.0 3.4727e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 0
VecNormalize 3102 1.0 2.6013e+00 4.8 1.03e+08 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 79
MatMult 3105 1.0 2.5316e+02 1.4 1.52e+12 1.0 0.0e+00 0.0e+00 3.1e+03 74100 0 0 33 74100 0 0 33 11999
MatAssemblyBegin 2 1.0 2.0132e+01910.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 3 0 0 0 0 3 0 0 0 0 0
MatAssemblyEnd 2 1.0 5.2810e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 1.0 1.5538e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3102 1.0 6.7246e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 3000 1.0 7.6143e+0197.9 2.04e+09 1.0 0.0e+00 0.0e+00 3.0e+03 13 0 0 0 31 13 0 0 0 31 54
KSPSetUp 3 1.0 7.3075e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 3 1.0 2.5436e+02 1.0 1.52e+12 1.0 0.0e+00 0.0e+00 9.2e+03 87100 0 0 96 87100 0 0 96 11949
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 0 0 0.
Index Set 51 51 39576 0.
IS L to G Mapping 15 15 215892 0.
Vector 205 205 14583280 0.
Vector Scatter 26 26 436360 0.
Matrix 9 9 1958884008 0.
Preconditioner 3 3 2448 0.
Krylov Solver 3 3 55200 0.
Distributed Mesh 25 25 328692 0.
Star Forest Bipartite Graph 50 50 42176 0.
Discrete System 25 25 21600 0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 3.65734e-05
Average time for zero size MPI_Send(): 6.1512e-05
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------
Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------
Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information:
pipe length: 2.000000, pipe radius: 1.000000
delta length of pipe is 0.050000, epsilon of pipe is 2.000000
threshold of seriers is 30
b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
create matrix method: pf_stokesletsInPipe
solve method: gmres, precondition method: none
output file headle: force_pipe
MPI size: 1
Stokeslets in pipe prepare, contain 7376 nodes
create matrix use 80.827850s:
_00001/00001_b=0.000100: calculate boundary condation use: 3.421076s
KSP Object: 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqdense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u1: solve matrix equation use: 88.631310s, with residual norm 9.884635e-04
KSP Object: 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqdense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u2: solve matrix equation use: 88.855811s, with residual norm 4.144572e-04
KSP Object: 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqdense
rows=22128, cols=22128
total: nonzeros=489648384, allocated nonzeros=489648384
total number of mallocs used during MatSetValues calls =0
_00001/00001_u3: solve matrix equation use: 88.673738s, with residual norm 4.864481e-03
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
force_pipe.py on a linux-mpich-opblas named cn5 with 1 processor, by zhangji Tue Jun 13 17:34:55 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 3.521e+02 1.00000 3.521e+02
Objects: 4.010e+02 1.00000 4.010e+02
Flops: 3.042e+12 1.00000 3.042e+12 3.042e+12
Flops/sec: 8.640e+09 1.00000 8.640e+09 8.640e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.5212e+02 100.0% 3.0421e+12 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 3000 1.0 7.4338e-01 1.0 2.04e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2750
VecNorm 3102 1.0 3.6990e-02 1.0 1.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3711
VecScale 3102 1.0 2.4405e-02 1.0 6.86e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2813
VecCopy 3204 1.0 1.1260e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 173 1.0 4.3569e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 201 1.0 1.3273e-02 1.0 8.90e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 670
VecMAXPY 3102 1.0 5.2372e-01 1.0 2.18e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4158
VecAssemblyBegin 9 1.0 4.3392e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 9 1.0 4.2915e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 9 1.0 2.2984e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 3102 1.0 6.4425e-02 1.0 2.06e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3196
MatMult 3105 1.0 2.6468e+02 1.0 3.04e+12 1.0 0.0e+00 0.0e+00 0.0e+00 75100 0 0 0 75100 0 0 0 11477
MatAssemblyBegin 2 1.0 3.0994e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 1.0 6.1703e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3102 1.0 1.1186e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 3000 1.0 1.2434e+00 1.0 4.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3289
KSPSetUp 3 1.0 5.0569e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 3 1.0 2.6590e+02 1.0 3.04e+12 1.0 0.0e+00 0.0e+00 0.0e+00 76100 0 0 0 76100 0 0 0 11430
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 0 0 0.
Index Set 47 47 36472 0.
IS L to G Mapping 15 15 422420 0.
Vector 201 201 26873904 0.
Vector Scatter 24 24 15744 0.
Matrix 7 7 3917737368 0.
Preconditioner 3 3 2448 0.
Krylov Solver 3 3 55200 0.
Distributed Mesh 25 25 535220 0.
Star Forest Bipartite Graph 50 50 42176 0.
Discrete System 25 25 21600 0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------
Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------
Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
More information about the petsc-users
mailing list