[petsc-users] Very poor speed up performance

Yongjun Chen yjxd.chen at gmail.com
Wed Dec 22 12:11:12 CST 2010


On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> On Wed, 22 Dec 2010, Yongjun Chen wrote:
>
> > On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> >
> > > Thanks a lot, Satish. It is much clear now. But for the choice of the
> two,
> > the program dmidecode does not show this information. Do you know any way
> to
> > get it?
>
> why do you expect dmidecode to show that?
>
> You'll have to look for the CPU/chipset hardware documentation - and
> look at the details - and sometimes they mention these details..
>
> Satish
>


Thanks, Satish. Yes, I need to check it.
Just now I re-configured PETSC with the option --with-device=ch3:nemsis. The
results are almost the same as --with-device=ch3:sock. As can be seen in the
attachment.
I hope the matrix partitioning - reordering algorithm would have some
positive effects.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/3dc041ce/attachment-0001.htm>
-------------- next part --------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.

End Assembly.

=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 17:41:47 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.32502e-06
Norm of error 1.32502e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 333.681 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 17:47:21 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Wed Dec 22 18:47:21 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.558e+02      1.00000   3.558e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
Flops/sec:            2.190e+08      1.09702   2.140e+08  1.712e+09
MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.5581e+02 100.0%  6.0914e+11 100.0%  4.135e+04 100.0%  2.430e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.5404e+02 1.6 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  1876
MatMultTranspose    1473 1.0 1.4721e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 37 47 50 50  0  37 47 50 50  0  1962
MatAssemblyBegin       1 1.0 6.0289e-0316.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.2618e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.0790e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0855e+0112.8 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 9.9344e+0120.5 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03 12  1  0  0 66  12  1  0  0 66    70
VecNorm             1475 1.0 5.6723e+00 2.9 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   613
VecCopy                4 1.0 5.5063e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.1978e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 8.6108e+00 1.3 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1209
VecAYPX             2944 1.0 6.0635e+00 1.4 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1144
VecAssemblyBegin       6 1.0 4.8455e-0217.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 3.5286e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 8.7080e+00 1.3 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   399
VecScatterBegin     2947 1.0 1.8601e+00 2.6 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 9.0296e+0116.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  12  0  0  0  0     0
KSPSetup               1 1.0 9.8538e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.2263e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 91100100100 99  91100100100 99  1887
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 8.7381e+00 1.3 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   397
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     84944064     0
                 Vec    18             18     15741712     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       409008     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 4.98295e-06
Average time for MPI_Barrier(): 9.76086e-05
Average time for zero size MPI_Send(): 2.81334e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 18:24:43 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3nemsis
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 5 of total 12 on wmss04Process 11 of total 12 on wmss04

Process 2 of total 12 on wmss04
Process 7 of total 12 on wmss04
Process 3 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 1 of total 12 on wmss04
Process 9 of total 12 on wmss04
Process 10 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 17:55:12 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28414e-06
Norm of error 1.28414e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 241.392 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 17:59:13 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Wed Dec 22 18:59:13 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           2.594e+02      1.00000   2.594e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
Flops/sec:            2.004e+08      1.11689   1.956e+08  2.348e+09
MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.5935e+02 100.0%  6.0890e+11 100.0%  6.498e+04 100.0%  2.345e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.1203e+02 1.5 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 39 47 50 50  0  39 47 50 50  0  2579
MatMultTranspose    1473 1.0 9.9342e+01 1.3 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 36 47 50 50  0  36 47 50 50  0  2906
MatAssemblyBegin       1 1.0 3.7930e-03 8.9 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.1536e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.2507e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.2744e+0166.4 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 5.4256e+0115.3 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03  6  1  0  0 66   6  1  0  0 66   128
VecNorm             1475 1.0 7.3386e+00 5.2 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   473
VecCopy                4 1.0 6.2873e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.5036e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 7.4288e+00 1.8 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1401
VecAYPX             2944 1.0 5.0487e+00 2.5 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1374
VecAssemblyBegin       6 1.0 3.4969e-0211.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 5.5075e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 7.2035e+00 1.7 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   482
VecScatterBegin     2947 1.0 2.5759e+00 2.7 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 5.1555e+0111.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  7  0  0  0  0   7  0  0  0  0     0
KSPSetup               1 1.0 8.2631e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.2851e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 88100100100 99  88100100100 99  2664
PCSetUp                1 1.0 7.1526e-06 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 7.2339e+00 1.7 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   480
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     56593044     0
                 Vec    18             18     10534536     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       305424     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 7.82013e-06
Average time for MPI_Barrier(): 9.52244e-05
Average time for zero size MPI_Send(): 2.15769e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 18:24:43 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3nemsis
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 16 on wmss04
Process 8 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process Process 3 of total 16 on wmss04
Process 15 of total 16 on wmss04
7 of total 16 on wmss04Process 1 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 13 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
Process 11 of total 16 on wmss04
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:Begin Assembly:

Begin Assembly:
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.End Assembly.

End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.

=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 17:50:47 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.23596e-06
Norm of error 1.23596e-06, Iterations 1481
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 227.888 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 17:54:35 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Wed Dec 22 18:54:35 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           2.442e+02      1.00001   2.442e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.922e+10      1.13060   3.822e+10  6.116e+11
Flops/sec:            1.606e+08      1.13060   1.565e+08  2.504e+09
MPI Messages:         1.187e+04      3.99916   7.051e+03  1.128e+05
MPI Message Lengths:  1.929e+09      7.80850   1.819e+05  2.052e+10
MPI Reductions:       4.501e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.4422e+02 100.0%  6.1159e+11 100.0%  1.128e+05 100.0%  1.819e+05      100.0%  4.485e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1482 1.0 1.1549e+02 2.0 1.86e+10 1.1 5.6e+04 1.8e+05 0.0e+00 36 47 50 50  0  36 47 50 50  0  2513
MatMultTranspose    1481 1.0 9.3652e+01 1.4 1.86e+10 1.1 5.6e+04 1.8e+05 0.0e+00 32 47 50 50  0  32 47 50 50  0  3097
MatAssemblyBegin       1 1.0 4.6110e-03 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.1871e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 5.1212e-04 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.2031e+01123.8 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2962 1.0 7.2313e+0122.5 4.36e+08 1.0 0.0e+00 0.0e+00 3.0e+03 13  1  0  0 66  13  1  0  0 66    96
VecNorm             1483 1.0 5.2508e+00 4.6 2.18e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   665
VecCopy                4 1.0 3.2623e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8891 1.0 2.5386e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4444 1.0 6.6341e+00 1.6 6.54e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1578
VecAYPX             2960 1.0 4.2830e+00 1.7 4.36e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  1628
VecAssemblyBegin       6 1.0 4.0186e-0213.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 6.0081e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2964 1.0 6.2569e+00 1.6 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   558
VecScatterBegin     2963 1.0 2.9219e+00 4.0 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2963 1.0 5.0568e+01 7.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 5.8019e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.1573e+02 1.0 3.92e+10 1.1 1.1e+05 1.8e+05 4.4e+03 88100100100 99  88100100100 99  2834
PCSetUp                1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2964 1.0 6.2830e+00 1.6 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   556
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     42424600     0
                 Vec    18             18      7924896     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       247632     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.38998e-05
Average time for MPI_Barrier(): 0.00011363
Average time for zero size MPI_Send(): 2.03103e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 18:24:43 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3nemsis
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------


More information about the petsc-users mailing list