[petsc-users] Very poor speed up performance

Yongjun Chen yjxd.chen at gmail.com
Wed Dec 22 09:55:23 CST 2010


Satish,

I have reconfigured the PETSC with –download-mpich=1 and
–with-device=ch3:sock. The results show that the speed up can now remain
increasing when computing cores increase from 1 to 16. However, the maximum
speed up is still only around 6.0 with 16 cores. The new log files can be
found in the attachment.



(1)

I checked the configuration of the first server again. This server is a
shared-memory computer, with

Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz

Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the
memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.

It seems that each core can get 2.7GB/s memory bandwidth which can fulfill
the basic requirement for sparse iterative solvers.

Is this correct? Does the shared-memory type of computer have no benefit for
PETSC when the memory bandwidth is limited?



(2)

Beside, we would like to continue our work by employing a matrix
partitioning / reordering algorithm, such as Metis or ParMetis, to improve
the speed up performance of the program. (The current program works without
any matrix decomposition.)



Matt, as you said in
http://lists.mcs.anl.gov/pipermail/petsc-users/2007-January/001017.html
,“Reordering
a matrix can result in fewer iterations for an iterative solver“.

Do you think the matrix partitioning/reordering will work for this program?
Or any further suggestions?



Any comments are very welcome! Thank you!








On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> On Mon, 20 Dec 2010, Yongjun Chen wrote:
>
> > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly
> and
> > see what I can get.
>
> hydra is just the process manager.
>
> Also --download-mpich uses a slightly older version - with
> device=ch3:sock for portability and valgrind reasons [development]
>
> You might want to install latest mpich manually with the defaut
> device=ch3:nemsis and recheck..
>
> satish
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/2f3dc444/attachment-0001.htm>
-------------- next part --------------
Process 0 of total 4 on wmss04
Process 2 of total 4 on wmss04
Process 1 of total 4 on wmss04
Process 3 of total 4 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 11:41:09 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28342e-06
Norm of error 1.28342e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 420.527 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:48:09 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Wed Dec 22 12:48:09 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           4.531e+02      1.00000   4.531e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                1.558e+11      1.06872   1.523e+11  6.091e+11
Flops/sec:            3.438e+08      1.06872   3.361e+08  1.344e+09
MPI Messages:         5.906e+03      2.00017   4.430e+03  1.772e+04
MPI Message Lengths:  1.727e+09      2.74432   2.658e+05  4.710e+09
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.5314e+02 100.0%  6.0914e+11 100.0%  1.772e+04 100.0%  2.658e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.7876e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50  0  39 47 50 50  0  1617
MatMultTranspose    1473 1.0 1.7886e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50  0  39 47 50 50  0  1615
MatAssemblyBegin       1 1.0 3.2670e-0312.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.1171e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.6379e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0934e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2946 1.0 1.9010e+01 2.2 1.73e+09 1.0 0.0e+00 0.0e+00 2.9e+03  3  1  0  0 66   3  1  0  0 66   365
VecNorm             1475 1.0 1.0313e+01 2.8 8.69e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   337
VecCopy                4 1.0 5.2447e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.8803e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.3866e+01 1.5 2.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0   751
VecAYPX             2944 1.0 1.0440e+01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   664
VecAssemblyBegin       6 1.0 1.0071e-0161.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 2.4080e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 1.6040e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   216
VecScatterBegin     2947 1.0 1.7367e+00 2.2 0.00e+00 0.0 1.8e+04 2.7e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 3.0331e+01 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
KSPSetup               1 1.0 1.3974e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 4.0934e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 4.4e+03 90100100100 99  90100100100 99  1488
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 1.6080e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   216
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    169902696     0
                 Vec    18             18     31282096     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       638616     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-06
Average time for MPI_Barrier(): 5.97954e-05
Average time for zero size MPI_Send(): 2.07424e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 11:12:03 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.32502e-06
Norm of error 1.32502e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 291.989 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:16:55 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Wed Dec 22 12:16:55 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.113e+02      1.00000   3.113e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
Flops/sec:            2.503e+08      1.09702   2.446e+08  1.957e+09
MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.1128e+02 100.0%  6.0914e+11 100.0%  4.135e+04 100.0%  2.430e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.2879e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 36 47 50 50  0  36 47 50 50  0  2244
MatMultTranspose    1473 1.0 1.2240e+02 1.3 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 37 47 50 50  0  37 47 50 50  0  2360
MatAssemblyBegin       1 1.0 3.1061e-03 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.0727e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.2912e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1926e+0113.1 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 6.5343e+0113.5 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03  9  1  0  0 66   9  1  0  0 66   106
VecNorm             1475 1.0 6.9889e+00 3.6 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   497
VecCopy                4 1.0 5.1496e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.2587e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 8.7103e+00 1.5 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1195
VecAYPX             2944 1.0 5.7803e+00 1.4 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1200
VecAssemblyBegin       6 1.0 3.9916e-0214.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 3.6001e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 8.6749e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   400
VecScatterBegin     2947 1.0 1.9621e+00 2.7 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 5.9072e+0110.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 8.9231e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.7991e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 90100100100 99  90100100100 99  2175
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 8.7041e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   399
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     84944064     0
                 Vec    18             18     15741712     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       409008     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 4.3869e-06
Average time for MPI_Barrier(): 7.25746e-05
Average time for zero size MPI_Send(): 2.06232e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 12 on wmss04
Process 2 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 11 of total 12 on wmss04
Process 1Process 3 of total 12 on wmss04
 of total 12 on wmss04
Process 5 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Process 9 of total 12 on wmss04
Process 10 of total 12 on wmss04
Process 7 of total 12 on wmss04
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 12:13:43 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28414e-06
Norm of error 1.28414e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 253.909 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 12:17:57 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Wed Dec 22 13:17:57 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           2.721e+02      1.00000   2.721e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
Flops/sec:            1.910e+08      1.11689   1.865e+08  2.238e+09
MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.7212e+02 100.0%  6.0890e+11 100.0%  6.498e+04 100.0%  2.345e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.2467e+02 1.6 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 37 47 50 50  0  37 47 50 50  0  2318
MatMultTranspose    1473 1.0 1.0645e+02 1.3 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  2712
MatAssemblyBegin       1 1.0 4.0723e-0274.7 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.3137e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.8801e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.2262e+0190.2 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 6.1395e+0111.5 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03  9  1  0  0 66   9  1  0  0 66   113
VecNorm             1475 1.0 5.8101e+00 3.3 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   598
VecCopy                4 1.0 5.6744e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.1137e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 6.6266e+00 1.4 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1571
VecAYPX             2944 1.0 5.2210e+00 2.3 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1328
VecAssemblyBegin       6 1.0 5.0129e-0218.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 4.7922e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 7.0911e+00 1.6 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   490
VecScatterBegin     2947 1.0 2.5096e+00 3.1 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 4.4540e+01 6.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 7.9119e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.4149e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 89100100100 99  89100100100 99  2521
PCSetUp                1 1.0 6.1989e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 7.1207e+00 1.6 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   488
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     56593044     0
                 Vec    18             18     10534536     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       305424     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 6.00815e-06
Average time for MPI_Barrier(): 0.000122833
Average time for zero size MPI_Send(): 2.81533e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 3 of total 16 on wmss04
Process 7 of total 16 on wmss04
Process 1 of total 16 on wmss04
Process 15 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 13 of total 16 on wmss04
Process 11 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 0 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 8 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.End Assembly.End Assembly.
End Assembly.

End Assembly.

End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.

End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 11:23:54 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.194e-06
Norm of error 1.194e-06, Iterations 1495
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 240.208 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:27:54 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Wed Dec 22 12:27:54 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           2.565e+02      1.00001   2.565e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.959e+10      1.13060   3.859e+10  6.174e+11
Flops/sec:            1.543e+08      1.13060   1.504e+08  2.407e+09
MPI Messages:         1.198e+04      3.99917   7.118e+03  1.139e+05
MPI Message Lengths:  1.948e+09      7.80981   1.819e+05  2.071e+10
MPI Reductions:       4.543e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.5651e+02 100.0%  6.1737e+11 100.0%  1.139e+05 100.0%  1.819e+05      100.0%  4.527e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1496 1.0 1.1625e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 38 47 50 50  0  38 47 50 50  0  2520
MatMultTranspose    1495 1.0 9.7790e+01 1.2 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  2994
MatAssemblyBegin       1 1.0 6.3910e-0314.3 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.2797e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 3.0708e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1235e+01111.3 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2990 1.0 5.7054e+0114.6 4.40e+08 1.0 0.0e+00 0.0e+00 3.0e+03  9  1  0  0 66   9  1  0  0 66   123
VecNorm             1497 1.0 5.8130e+00 3.5 2.20e+08 1.0 0.0e+00 0.0e+00 1.5e+03  2  1  0  0 33   2  1  0  0 33   607
VecCopy                4 1.0 3.3658e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8975 1.0 2.5879e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4486 1.0 7.5991e+00 1.6 6.60e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1391
VecAYPX             2988 1.0 4.6226e+00 1.6 4.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1523
VecAssemblyBegin       6 1.0 3.9858e-0213.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 6.6996e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2992 1.0 7.0992e+00 1.5 2.20e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   496
VecScatterBegin     2991 1.0 3.3736e+00 3.7 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2991 1.0 3.3633e+01 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 5.6469e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.2884e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 4.5e+03 89100100100 99  89100100100 99  2697
PCSetUp                1 1.0 5.0068e-06 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2992 1.0 7.1263e+00 1.5 2.20e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   494
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     42424600     0
                 Vec    18             18      7924896     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       247632     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 8.91685e-06
Average time for MPI_Barrier(): 0.000128984
Average time for zero size MPI_Send(): 1.8239e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------


More information about the petsc-users mailing list