[petsc-users] Why the convergence is much slower when I use two nodes

Ji Zhang gotofd at gmail.com
Tue Jun 13 07:38:17 CDT 2017


mpirun -n 1 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2
-b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view >
mpi_1.txt
mpirun -n 2 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2
-b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view >
mpi_2.txt
mpirun -n 4 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2
-b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view >
mpi_3.txt
mpirun -n 6 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2
-b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view >
mpi_4.txt
mpirun -n 2 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb
1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_5.txt
mpirun -n 4 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb
1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_6.txt
mpirun -n 6 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb
1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_7.txt

Dear Barry,

The following tests are runing in our cluster using one, two or three
nodes. Each node have 64GB memory and 24 cups (Intel(R) Xeon(R) CPU E5-2680
v3 @ 2.50GHz). Basic information of each node are listed below.

$ lstopo
Machine (64GB)
  NUMANode L#0 (P#0 32GB)
    Socket L#0 + L3 L#0 (30MB)
      L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0
(P#0)
      L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1
(P#1)
      L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2
(P#2)
      L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3
(P#3)
      L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4
(P#4)
      L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5
(P#5)
      L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6
(P#6)
      L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7
(P#7)
      L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8
(P#8)
      L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9
(P#9)
      L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU
L#10 (P#10)
      L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU
L#11 (P#11)
    HostBridge L#0
      PCIBridge
        PCI 1000:0097
          Block L#0 "sda"
      PCIBridge
        PCI 8086:1523
          Net L#1 "eth0"
        PCI 8086:1523
          Net L#2 "eth1"
      PCIBridge
        PCIBridge
          PCI 1a03:2000
      PCI 8086:8d02
  NUMANode L#1 (P#1 32GB)
    Socket L#1 + L3 L#1 (30MB)
      L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU
L#12 (P#12)
      L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU
L#13 (P#13)
      L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU
L#14 (P#14)
      L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU
L#15 (P#15)
      L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU
L#16 (P#16)
      L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU
L#17 (P#17)
      L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU
L#18 (P#18)
      L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU
L#19 (P#19)
      L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 + PU
L#20 (P#20)
      L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU
L#21 (P#21)
      L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU
L#22 (P#22)
      L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23 + PU
L#23 (P#23)
    HostBridge L#5
      PCIBridge
        PCI 15b3:1003
          Net L#3 "ib0"
          OpenFabrics L#4 "mlx4_0"

I have tested seven different cases. Each case solving three different
linear equation systems A*x1=b1, A*x2=b2, A*x3=b3. The matrix A is a
mpidense matrix and b1, b2, b3 are different vectors.
I'm using GMRES method without precondition method . I have set -ksp_mat_it
1000
           process  nodes  eq1_residual_norms eq1_duration
eq2_residual_norms eq2_duration eq3_residual_norms eq3_duration
mpi_1.txt: 1        1      9.884635e-04       88.631310s   4.144572e-04
  88.855811s   4.864481e-03       88.673738s
mpi_2.txt: 2        2      6.719300e-01       84.212435s   6.782443e-01
  85.153371s   7.223828e-01       85.246724s
mpi_3.txt: 4        4      5.813354e-01       52.616490s   5.397962e-01
  52.413213s   9.503432e-01       52.495871s
mpi_4.txt: 6        6      4.621066e-01       42.929705s   4.661823e-01
  43.367914s   1.047436e+00       43.108877s
mpi_5.txt: 2        1      6.719300e-01      141.490945s   6.782443e-01
 142.746243s   7.223828e-01      142.042608s
mpi_6.txt: 3        1      5.813354e-01      165.061162s   5.397962e-01
 196.539286s   9.503432e-01      180.240947s
mpi_7.txt: 6        1      4.621066e-01      213.683270s   4.661823e-01
 208.180939s   1.047436e+00      194.251886s
I found that all residual norms are on the order of 1 except the first
case, which one I only use one process at one node.
See the attach files for more details, please.


此致
    敬礼
张骥(博士研究生)
北京计算科学研究中心
北京市海淀区西北旺东路10号院东区9号楼 (100193)

Best,
Regards,
Zhang Ji, PhD student
Beijing Computational Science Research Center
Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian
District, Beijing 100193, China

On Tue, Jun 13, 2017 at 9:34 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>    You need to provide more information. What is the output of -ksp_view?
> and -log_view? for both cases
>
> > On Jun 12, 2017, at 7:11 PM, Ji Zhang <gotofd at gmail.com> wrote:
> >
> > Dear all,
> >
> > I'm a PETSc user. I'm using GMRES method to solve some linear equations.
> I'm using boundary element method, so the matrix type is dense (or
> mpidense). I'm using MPICH2, I found that the convergence is fast if I only
> use one computer node; and much more slower if I use two or more nodes. I'm
> interested in why this happen, and how can I improve the convergence
> performance when I use multi-nodes.
> >
> > Thanks a lot.
> >
> > 此致
> >     敬礼
> > 张骥(博士研究生)
> > 北京计算科学研究中心
> > 北京市海淀区西北旺东路10号院东区9号楼 (100193)
> >
> > Best,
> > Regards,
> > Zhang Ji, PhD student
> > Beijing Computational Science Research Center
> > Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian
> District, Beijing 100193, China
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170613/07ed146c/attachment-0001.html>
-------------- next part --------------
Case information: 
  pipe length: 2.000000, pipe radius: 1.000000
  delta length of pipe is 0.050000, epsilon of pipe is  2.000000
  threshold of seriers is 30
  b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
  create matrix method: pf_stokesletsInPipe 
  solve method: gmres, precondition method: none
  output file headle: force_pipe
MPI size: 6
Stokeslets in pipe prepare, contain 7376 nodes
  create matrix use 3.737578s:
  _00001/00001_b=0.000100:    calculate boundary condation use: 1.798243s
KSP Object: 6 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   6 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u1: solve matrix equation use: 213.683270s, with residual norm 4.621066e-01
KSP Object: 6 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   6 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u2: solve matrix equation use: 208.180939s, with residual norm 4.661823e-01
KSP Object: 6 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   6 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u3: solve matrix equation use: 194.251886s, with residual norm 1.047436e+00
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

force_pipe.py on a linux-mpich-opblas named cn10 with 6 processors, by zhangji Tue Jun 13 18:11:46 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017 

                         Max       Max/Min        Avg      Total 
Time (sec):           6.236e+02      1.00013   6.235e+02
Objects:              4.130e+02      1.00000   4.130e+02
Flops:                5.073e+11      1.00081   5.070e+11  3.042e+12
Flops/sec:            8.136e+08      1.00092   8.131e+08  4.879e+09
MPI Messages:         4.200e+01      2.33333   3.000e+01  1.800e+02
MPI Message Lengths:  2.520e+02      2.33333   6.000e+00  1.080e+03
MPI Reductions:       9.541e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 6.2355e+02 100.0%  3.0421e+12 100.0%  1.800e+02 100.0%  6.000e+00      100.0%  9.540e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot             3000 1.0 1.7038e+02 1.7 3.41e+08 1.0 0.0e+00 0.0e+00 3.0e+03 20  0  0  0 31  20  0  0  0 31    12
VecNorm             3102 1.0 7.8933e+01 1.2 2.29e+07 1.0 0.0e+00 0.0e+00 3.1e+03 11  0  0  0 33  11  0  0  0 33     2
VecScale            3102 1.0 3.2920e-02 3.1 1.14e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2085
VecCopy             3204 1.0 1.1629e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               123 1.0 1.1544e-0212.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              201 1.0 1.8733e-03 1.4 1.48e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4749
VecMAXPY            3102 1.0 3.3990e-01 2.0 3.63e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  6406
VecAssemblyBegin       9 1.0 1.8613e-03 2.0 0.00e+00 0.0 1.8e+02 6.0e+00 2.7e+01  0  0100100  0   0  0100100  0     0
VecAssemblyEnd         9 1.0 3.7193e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     3114 1.0 8.3257e+01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 10  0  0  0 33  10  0  0  0 33     0
VecNormalize        3102 1.0 7.8971e+01 1.2 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03 11  0  0  0 33  11  0  0  0 33     3
MatMult             3105 1.0 4.4362e+02 1.2 5.07e+11 1.0 0.0e+00 0.0e+00 3.1e+03 67100  0  0 33  67100  0  0 33  6848
MatAssemblyBegin       2 1.0 7.7588e-0211.5 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 1.7595e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 1.0 6.1056e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                3 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3102 1.0 1.1835e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog      3000 1.0 1.7062e+02 1.7 6.82e+08 1.0 0.0e+00 0.0e+00 3.0e+03 20  0  0  0 31  20  0  0  0 31    24
KSPSetUp               3 1.0 3.8290e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               3 1.0 6.1546e+02 1.0 5.07e+11 1.0 0.0e+00 0.0e+00 9.2e+03 99100  0  0 96  99100  0  0 96  4938
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     1              0            0     0.
           Index Set    51             51        39576     0.
   IS L to G Mapping    15             15        78244     0.
              Vector   205            205      6265288     0.
      Vector Scatter    26             26       161064     0.
              Matrix     9              9    653332008     0.
      Preconditioner     3              3         2448     0.
       Krylov Solver     3              3        55200     0.
    Distributed Mesh    25             25       191044     0.
Star Forest Bipartite Graph    50             50        42176     0.
     Discrete System    25             25        21600     0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 6.19888e-06
Average time for zero size MPI_Send(): 2.30471e-06
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------

Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC   ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------

Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information: 
  pipe length: 2.000000, pipe radius: 1.000000
  delta length of pipe is 0.050000, epsilon of pipe is  2.000000
  threshold of seriers is 30
  b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
  create matrix method: pf_stokesletsInPipe 
  solve method: gmres, precondition method: none
  output file headle: force_pipe
MPI size: 4
Stokeslets in pipe prepare, contain 7376 nodes
  create matrix use 4.977263s:
  _00001/00001_b=0.000100:    calculate boundary condation use: 1.769788s
KSP Object: 4 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u1: solve matrix equation use: 165.061162s, with residual norm 5.813354e-01
KSP Object: 4 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u2: solve matrix equation use: 196.539286s, with residual norm 5.397962e-01
KSP Object: 4 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u3: solve matrix equation use: 180.240947s, with residual norm 9.503432e-01
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

force_pipe.py on a linux-mpich-opblas named cn10 with 4 processors, by zhangji Tue Jun 13 18:01:22 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017 

                         Max       Max/Min        Avg      Total 
Time (sec):           5.506e+02      1.00007   5.506e+02
Objects:              4.130e+02      1.00000   4.130e+02
Flops:                7.605e+11      1.00000   7.605e+11  3.042e+12
Flops/sec:            1.381e+09      1.00007   1.381e+09  5.525e+09
MPI Messages:         3.000e+01      1.66667   2.700e+01  1.080e+02
MPI Message Lengths:  1.800e+02      1.66667   6.000e+00  6.480e+02
MPI Reductions:       9.541e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 5.5060e+02 100.0%  3.0421e+12 100.0%  1.080e+02 100.0%  6.000e+00      100.0%  9.540e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot             3000 1.0 1.0788e+02 1.9 5.11e+08 1.0 0.0e+00 0.0e+00 3.0e+03 14  0  0  0 31  14  0  0  0 31    19
VecNorm             3102 1.0 5.0609e+01 1.1 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03  9  0  0  0 33   9  0  0  0 33     3
VecScale            3102 1.0 1.3757e-02 1.1 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4990
VecCopy             3204 1.0 9.1627e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               123 1.0 1.4656e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              201 1.0 2.9745e-03 1.6 2.22e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2991
VecMAXPY            3102 1.0 4.4902e-01 1.6 5.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4849
VecAssemblyBegin       9 1.0 1.2916e-01 1.6 0.00e+00 0.0 1.1e+02 6.0e+00 2.7e+01  0  0100100  0   0  0100100  0     0
VecAssemblyEnd         9 1.0 3.3617e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     3114 1.0 5.2983e+01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03  7  0  0  0 33   7  0  0  0 33     0
VecNormalize        3102 1.0 5.0635e+01 1.1 5.15e+07 1.0 0.0e+00 0.0e+00 3.1e+03  9  0  0  0 33   9  0  0  0 33     4
MatMult             3105 1.0 4.3607e+02 1.1 7.59e+11 1.0 0.0e+00 0.0e+00 3.1e+03 76100  0  0 33  76100  0  0 33  6966
MatAssemblyBegin       2 1.0 2.7158e-013390.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 1.6093e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 1.0 3.9665e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                3 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3102 1.0 9.2988e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog      3000 1.0 1.0832e+02 1.9 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03 14  0  0  0 31  14  0  0  0 31    38
KSPSetUp               3 1.0 3.7909e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               3 1.0 5.4123e+02 1.0 7.60e+11 1.0 0.0e+00 0.0e+00 9.2e+03 98100  0  0 96  98100  0  0 96  5615
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     1              0            0     0.
           Index Set    51             51        39576     0.
   IS L to G Mapping    15             15       112628     0.
              Vector   205            205      8343112     0.
      Vector Scatter    26             26       229832     0.
              Matrix     9              9    979454472     0.
      Preconditioner     3              3         2448     0.
       Krylov Solver     3              3        55200     0.
    Distributed Mesh    25             25       225428     0.
Star Forest Bipartite Graph    50             50        42176     0.
     Discrete System    25             25        21600     0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 2.6226e-06
Average time for zero size MPI_Send(): 2.26498e-06
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------

Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC   ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------

Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information: 
  pipe length: 2.000000, pipe radius: 1.000000
  delta length of pipe is 0.050000, epsilon of pipe is  2.000000
  threshold of seriers is 30
  b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
  create matrix method: pf_stokesletsInPipe 
  solve method: gmres, precondition method: none
  output file headle: force_pipe
MPI size: 2
Stokeslets in pipe prepare, contain 7376 nodes
  create matrix use 8.694003s:
  _00001/00001_b=0.000100:    calculate boundary condation use: 1.975384s
KSP Object: 2 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   2 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u1: solve matrix equation use: 141.490945s, with residual norm 6.719300e-01
KSP Object: 2 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   2 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u2: solve matrix equation use: 142.746243s, with residual norm 6.782443e-01
KSP Object: 2 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   2 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u3: solve matrix equation use: 142.042608s, with residual norm 7.223828e-01
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

force_pipe.py on a linux-mpich-opblas named cn10 with 2 processors, by zhangji Tue Jun 13 17:52:11 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017 

                         Max       Max/Min        Avg      Total 
Time (sec):           4.388e+02      1.00006   4.387e+02
Objects:              4.130e+02      1.00000   4.130e+02
Flops:                1.521e+12      1.00000   1.521e+12  3.042e+12
Flops/sec:            3.467e+09      1.00006   3.467e+09  6.934e+09
MPI Messages:         1.200e+01      1.00000   1.200e+01  2.400e+01
MPI Message Lengths:  1.080e+02      1.00000   9.000e+00  2.160e+02
MPI Reductions:       9.541e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.3875e+02 100.0%  3.0421e+12 100.0%  2.400e+01 100.0%  9.000e+00      100.0%  9.540e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot             3000 1.0 2.6931e+01 4.4 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03  4  0  0  0 31   4  0  0  0 31    76
VecNorm             3102 1.0 1.4946e+00 2.1 6.86e+07 1.0 0.0e+00 0.0e+00 3.1e+03  0  0  0  0 33   0  0  0  0 33    92
VecScale            3102 1.0 1.9959e-02 1.4 3.43e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3439
VecCopy             3204 1.0 8.3234e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               123 1.0 2.2550e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              201 1.0 2.7293e-01 1.2 4.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    33
VecMAXPY            3102 1.0 5.8037e-01 1.7 1.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3752
VecAssemblyBegin       9 1.0 1.0383e-0215.2 0.00e+00 0.0 2.4e+01 9.0e+00 2.7e+01  0  0100100  0   0  0100100  0     0
VecAssemblyEnd         9 1.0 2.5272e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     3114 1.0 8.7240e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03  0  0  0  0 33   0  0  0  0 33     0
VecNormalize        3102 1.0 1.5182e+00 2.1 1.03e+08 1.0 0.0e+00 0.0e+00 3.1e+03  0  0  0  0 33   0  0  0  0 33   136
MatMult             3105 1.0 4.1876e+02 1.1 1.52e+12 1.0 0.0e+00 0.0e+00 3.1e+03 93100  0  0 33  93100  0  0 33  7254
MatAssemblyBegin       2 1.0 5.4870e-02676.9 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 1.6594e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 1.0 1.1323e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                3 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3102 1.0 8.3565e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog      3000 1.0 2.7499e+01 4.3 2.04e+09 1.0 0.0e+00 0.0e+00 3.0e+03  4  0  0  0 31   4  0  0  0 31   149
KSPSetUp               3 1.0 3.8886e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               3 1.0 4.2584e+02 1.0 1.52e+12 1.0 0.0e+00 0.0e+00 9.2e+03 97100  0  0 96  97100  0  0 96  7137
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     1              0            0     0.
           Index Set    51             51        39576     0.
   IS L to G Mapping    15             15       215892     0.
              Vector   205            205     14583280     0.
      Vector Scatter    26             26       436360     0.
              Matrix     9              9   1958884008     0.
      Preconditioner     3              3         2448     0.
       Krylov Solver     3              3        55200     0.
    Distributed Mesh    25             25       328692     0.
Star Forest Bipartite Graph    50             50        42176     0.
     Discrete System    25             25        21600     0.
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 1.7643e-06
Average time for zero size MPI_Send(): 3.45707e-06
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------

Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC   ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------

Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information: 
  pipe length: 2.000000, pipe radius: 1.000000
  delta length of pipe is 0.050000, epsilon of pipe is  2.000000
  threshold of seriers is 30
  b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
  create matrix method: pf_stokesletsInPipe 
  solve method: gmres, precondition method: none
  output file headle: force_pipe
MPI size: 6
Stokeslets in pipe prepare, contain 7376 nodes
  create matrix use 3.769214s:
  _00001/00001_b=0.000100:    calculate boundary condation use: 1.710873s
KSP Object: 6 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   6 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u1: solve matrix equation use: 42.929705s, with residual norm 4.621066e-01
KSP Object: 6 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   6 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u2: solve matrix equation use: 43.367914s, with residual norm 4.661823e-01
KSP Object: 6 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 6 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   6 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u3: solve matrix equation use: 43.108877s, with residual norm 1.047436e+00
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

force_pipe.py on a linux-mpich-opblas named cn5 with 6 processors, by zhangji Tue Jun 13 17:44:52 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.367e+02      1.00029   1.367e+02
Objects:              4.130e+02      1.00000   4.130e+02
Flops:                5.073e+11      1.00081   5.070e+11  3.042e+12
Flops/sec:            3.713e+09      1.00111   3.710e+09  2.226e+10
MPI Messages:         4.200e+01      2.33333   3.000e+01  1.800e+02
MPI Message Lengths:  2.520e+02      2.33333   6.000e+00  1.080e+03
MPI Reductions:       9.541e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.3666e+02 100.0%  3.0421e+12 100.0%  1.800e+02 100.0%  6.000e+00      100.0%  9.540e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot             3000 1.0 8.5456e+00 5.3 3.41e+08 1.0 0.0e+00 0.0e+00 3.0e+03  4  0  0  0 31   4  0  0  0 31   239
VecNorm             3102 1.0 1.0225e+00 1.7 2.29e+07 1.0 0.0e+00 0.0e+00 3.1e+03  1  0  0  0 33   1  0  0  0 33   134
VecScale            3102 1.0 9.2647e-03 1.5 1.14e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  7409
VecCopy             3204 1.0 3.4087e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               123 1.0 1.1790e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              201 1.0 1.6315e-03 1.4 1.48e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  5452
VecMAXPY            3102 1.0 9.3553e-02 1.1 3.63e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 23274
VecAssemblyBegin       9 1.0 1.2650e-02 2.2 0.00e+00 0.0 1.8e+02 6.0e+00 2.7e+01  0  0100100  0   0  0100100  0     0
VecAssemblyEnd         9 1.0 4.8575e-022122.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     3114 1.0 7.4042e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03  5  0  0  0 33   5  0  0  0 33     0
VecNormalize        3102 1.0 1.0357e+00 1.7 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03  1  0  0  0 33   1  0  0  0 33   199
MatMult             3105 1.0 1.2659e+02 1.1 5.07e+11 1.0 0.0e+00 0.0e+00 3.1e+03 90100  0  0 33  90100  0  0 33 23996
MatAssemblyBegin       2 1.0 2.0758e-01138.7 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 1.9758e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 1.0 1.5299e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                3 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3102 1.0 3.6064e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog      3000 1.0 8.6466e+00 5.1 6.82e+08 1.0 0.0e+00 0.0e+00 3.0e+03  4  0  0  0 31   4  0  0  0 31   473
KSPSetUp               3 1.0 1.5187e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               3 1.0 1.2925e+02 1.0 5.07e+11 1.0 0.0e+00 0.0e+00 9.2e+03 95100  0  0 96  95100  0  0 96 23515
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     1              0            0     0.
           Index Set    51             51        39576     0.
   IS L to G Mapping    15             15        78244     0.
              Vector   205            205      6265288     0.
      Vector Scatter    26             26       161064     0.
              Matrix     9              9    653332008     0.
      Preconditioner     3              3         2448     0.
       Krylov Solver     3              3        55200     0.
    Distributed Mesh    25             25       191044     0.
Star Forest Bipartite Graph    50             50        42176     0.
     Discrete System    25             25        21600     0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 0.000177193
Average time for zero size MPI_Send(): 3.89814e-05
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------

Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC   ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------

Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information: 
  pipe length: 2.000000, pipe radius: 1.000000
  delta length of pipe is 0.050000, epsilon of pipe is  2.000000
  threshold of seriers is 30
  b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
  create matrix method: pf_stokesletsInPipe 
  solve method: gmres, precondition method: none
  output file headle: force_pipe
MPI size: 4
Stokeslets in pipe prepare, contain 7376 nodes
  create matrix use 6.052226s:
  _00001/00001_b=0.000100:    calculate boundary condation use: 1.826128s
KSP Object: 4 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u1: solve matrix equation use: 52.616490s, with residual norm 5.813354e-01
KSP Object: 4 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u2: solve matrix equation use: 52.413213s, with residual norm 5.397962e-01
KSP Object: 4 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u3: solve matrix equation use: 52.495871s, with residual norm 9.503432e-01
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

force_pipe.py on a linux-mpich-opblas named cn5 with 4 processors, by zhangji Tue Jun 13 17:42:35 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.675e+02      1.00001   1.675e+02
Objects:              4.130e+02      1.00000   4.130e+02
Flops:                7.605e+11      1.00000   7.605e+11  3.042e+12
Flops/sec:            4.541e+09      1.00001   4.541e+09  1.816e+10
MPI Messages:         3.000e+01      1.66667   2.700e+01  1.080e+02
MPI Message Lengths:  1.800e+02      1.66667   6.000e+00  6.480e+02
MPI Reductions:       9.541e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.6749e+02 100.0%  3.0421e+12 100.0%  1.080e+02 100.0%  6.000e+00      100.0%  9.540e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot             3000 1.0 3.3006e+0156.7 5.11e+08 1.0 0.0e+00 0.0e+00 3.0e+03  9  0  0  0 31   9  0  0  0 31    62
VecNorm             3102 1.0 1.4346e+00 3.8 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03  1  0  0  0 33   1  0  0  0 33    96
VecScale            3102 1.0 1.0346e-02 1.2 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  6634
VecCopy             3204 1.0 3.8601e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               123 1.0 1.1497e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              201 1.0 1.7583e-03 1.1 2.22e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  5059
VecMAXPY            3102 1.0 1.4580e-01 1.1 5.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 14934
VecAssemblyBegin       9 1.0 9.8622e-03 2.9 0.00e+00 0.0 1.1e+02 6.0e+00 2.7e+01  0  0100100  0   0  0100100  0     0
VecAssemblyEnd         9 1.0 3.1948e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     3114 1.0 5.3085e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03  3  0  0  0 33   3  0  0  0 33     0
VecNormalize        3102 1.0 1.4464e+00 3.7 5.15e+07 1.0 0.0e+00 0.0e+00 3.1e+03  1  0  0  0 33   1  0  0  0 33   142
MatMult             3105 1.0 1.5605e+02 1.3 7.59e+11 1.0 0.0e+00 0.0e+00 3.1e+03 84100  0  0 33  84100  0  0 33 19466
MatAssemblyBegin       2 1.0 9.4833e-0148.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 9.2912e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 1.0 1.7860e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                3 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3102 1.0 4.0367e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog      3000 1.0 3.3157e+0146.5 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03  9  0  0  0 31   9  0  0  0 31   123
KSPSetUp               3 1.0 8.2684e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               3 1.0 1.5735e+02 1.0 7.60e+11 1.0 0.0e+00 0.0e+00 9.2e+03 94100  0  0 96  94100  0  0 96 19315
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     1              0            0     0.
           Index Set    51             51        39576     0.
   IS L to G Mapping    15             15       112628     0.
              Vector   205            205      8343112     0.
      Vector Scatter    26             26       229832     0.
              Matrix     9              9    979454472     0.
      Preconditioner     3              3         2448     0.
       Krylov Solver     3              3        55200     0.
    Distributed Mesh    25             25       225428     0.
Star Forest Bipartite Graph    50             50        42176     0.
     Discrete System    25             25        21600     0.
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 7.45773e-05
Average time for zero size MPI_Send(): 6.04987e-05
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------

Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC   ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------

Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information: 
  pipe length: 2.000000, pipe radius: 1.000000
  delta length of pipe is 0.050000, epsilon of pipe is  2.000000
  threshold of seriers is 30
  b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
  create matrix method: pf_stokesletsInPipe 
  solve method: gmres, precondition method: none
  output file headle: force_pipe
MPI size: 2
Stokeslets in pipe prepare, contain 7376 nodes
  create matrix use 30.575036s:
  _00001/00001_b=0.000100:    calculate boundary condation use: 3.463875s
KSP Object: 2 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   2 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u1: solve matrix equation use: 84.212435s, with residual norm 6.719300e-01
KSP Object: 2 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   2 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u2: solve matrix equation use: 85.153371s, with residual norm 6.782443e-01
KSP Object: 2 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   2 MPI processes
    type: mpidense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u3: solve matrix equation use: 85.246724s, with residual norm 7.223828e-01
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

force_pipe.py on a linux-mpich-opblas named cn5 with 2 processors, by zhangji Tue Jun 13 17:39:46 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.908e+02      1.00002   2.908e+02
Objects:              4.130e+02      1.00000   4.130e+02
Flops:                1.521e+12      1.00000   1.521e+12  3.042e+12
Flops/sec:            5.231e+09      1.00002   5.231e+09  1.046e+10
MPI Messages:         1.200e+01      1.00000   1.200e+01  2.400e+01
MPI Message Lengths:  1.080e+02      1.00000   9.000e+00  2.160e+02
MPI Reductions:       9.541e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.9076e+02 100.0%  3.0421e+12 100.0%  2.400e+01 100.0%  9.000e+00      100.0%  9.540e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot             3000 1.0 7.5770e+01146.1 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03 13  0  0  0 31  13  0  0  0 31    27
VecNorm             3102 1.0 2.5821e+00 4.9 6.86e+07 1.0 0.0e+00 0.0e+00 3.1e+03  1  0  0  0 33   1  0  0  0 33    53
VecScale            3102 1.0 1.5304e-02 1.1 3.43e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4485
VecCopy             3204 1.0 6.6598e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               123 1.0 1.7567e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              201 1.0 1.9052e-02 2.3 4.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   467
VecMAXPY            3102 1.0 3.7953e-01 1.4 1.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  5737
VecAssemblyBegin       9 1.0 7.8998e-03 3.9 0.00e+00 0.0 2.4e+01 9.0e+00 2.7e+01  0  0100100  0   0  0100100  0     0
VecAssemblyEnd         9 1.0 2.5034e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     3114 1.0 3.4727e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03  1  0  0  0 33   1  0  0  0 33     0
VecNormalize        3102 1.0 2.6013e+00 4.8 1.03e+08 1.0 0.0e+00 0.0e+00 3.1e+03  1  0  0  0 33   1  0  0  0 33    79
MatMult             3105 1.0 2.5316e+02 1.4 1.52e+12 1.0 0.0e+00 0.0e+00 3.1e+03 74100  0  0 33  74100  0  0 33 11999
MatAssemblyBegin       2 1.0 2.0132e+01910.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  3  0  0  0  0   3  0  0  0  0     0
MatAssemblyEnd         2 1.0 5.2810e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 1.0 1.5538e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3102 1.0 6.7246e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog      3000 1.0 7.6143e+0197.9 2.04e+09 1.0 0.0e+00 0.0e+00 3.0e+03 13  0  0  0 31  13  0  0  0 31    54
KSPSetUp               3 1.0 7.3075e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               3 1.0 2.5436e+02 1.0 1.52e+12 1.0 0.0e+00 0.0e+00 9.2e+03 87100  0  0 96  87100  0  0 96 11949
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     1              0            0     0.
           Index Set    51             51        39576     0.
   IS L to G Mapping    15             15       215892     0.
              Vector   205            205     14583280     0.
      Vector Scatter    26             26       436360     0.
              Matrix     9              9   1958884008     0.
      Preconditioner     3              3         2448     0.
       Krylov Solver     3              3        55200     0.
    Distributed Mesh    25             25       328692     0.
Star Forest Bipartite Graph    50             50        42176     0.
     Discrete System    25             25        21600     0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 3.65734e-05
Average time for zero size MPI_Send(): 6.1512e-05
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------

Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC   ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------

Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
Case information: 
  pipe length: 2.000000, pipe radius: 1.000000
  delta length of pipe is 0.050000, epsilon of pipe is  2.000000
  threshold of seriers is 30
  b: 1 numbers are evenly distributed within the range [0.000100, 0.900000]
  create matrix method: pf_stokesletsInPipe 
  solve method: gmres, precondition method: none
  output file headle: force_pipe
MPI size: 1
Stokeslets in pipe prepare, contain 7376 nodes
  create matrix use 80.827850s:
  _00001/00001_b=0.000100:    calculate boundary condation use: 3.421076s
KSP Object: 1 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqdense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u1: solve matrix equation use: 88.631310s, with residual norm 9.884635e-04
KSP Object: 1 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqdense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u2: solve matrix equation use: 88.855811s, with residual norm 4.144572e-04
KSP Object: 1 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqdense
    rows=22128, cols=22128
    total: nonzeros=489648384, allocated nonzeros=489648384
    total number of mallocs used during MatSetValues calls =0
  _00001/00001_u3: solve matrix equation use: 88.673738s, with residual norm 4.864481e-03
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

force_pipe.py on a linux-mpich-opblas named cn5 with 1 processor, by zhangji Tue Jun 13 17:34:55 2017
Using Petsc Release Version 3.7.6, Apr, 24, 2017 

                         Max       Max/Min        Avg      Total 
Time (sec):           3.521e+02      1.00000   3.521e+02
Objects:              4.010e+02      1.00000   4.010e+02
Flops:                3.042e+12      1.00000   3.042e+12  3.042e+12
Flops/sec:            8.640e+09      1.00000   8.640e+09  8.640e+09
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.5212e+02 100.0%  3.0421e+12 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot             3000 1.0 7.4338e-01 1.0 2.04e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2750
VecNorm             3102 1.0 3.6990e-02 1.0 1.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3711
VecScale            3102 1.0 2.4405e-02 1.0 6.86e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2813
VecCopy             3204 1.0 1.1260e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               173 1.0 4.3569e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              201 1.0 1.3273e-02 1.0 8.90e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   670
VecMAXPY            3102 1.0 5.2372e-01 1.0 2.18e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4158
VecAssemblyBegin       9 1.0 4.3392e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         9 1.0 4.2915e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin        9 1.0 2.2984e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize        3102 1.0 6.4425e-02 1.0 2.06e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3196
MatMult             3105 1.0 2.6468e+02 1.0 3.04e+12 1.0 0.0e+00 0.0e+00 0.0e+00 75100  0  0  0  75100  0  0  0 11477
MatAssemblyBegin       2 1.0 3.0994e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 1.0 6.1703e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3102 1.0 1.1186e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog      3000 1.0 1.2434e+00 1.0 4.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3289
KSPSetUp               3 1.0 5.0569e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               3 1.0 2.6590e+02 1.0 3.04e+12 1.0 0.0e+00 0.0e+00 0.0e+00 76100  0  0  0  76100  0  0  0 11430
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     1              0            0     0.
           Index Set    47             47        36472     0.
   IS L to G Mapping    15             15       422420     0.
              Vector   201            201     26873904     0.
      Vector Scatter    24             24        15744     0.
              Matrix     7              7   3917737368     0.
      Preconditioner     3              3         2448     0.
       Krylov Solver     3              3        55200     0.
    Distributed Mesh    25             25       535220     0.
Star Forest Bipartite Graph    50             50        42176     0.
     Discrete System    25             25        21600     0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-b0 1e-4
-b1 0.9
-dp 0.05
-ep 2
-ksp_max_it 1000
-ksp_view
-log_view
-lp 2
-nb 1
-th 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3
-----------------------------------------
Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 
Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /home/zhangji/python/petsc-3.7.6
Using PETSc arch: linux-mpich-opblas
-----------------------------------------

Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC   ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include
-----------------------------------------

Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc
Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90
Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------


More information about the petsc-users mailing list