[petsc-users] Very poor speed up performance

Yongjun Chen yjxd.chen at gmail.com
Wed Dec 22 09:55:23 CST 2010


I have reconfigured the PETSC with –download-mpich=1 and
–with-device=ch3:sock. The results show that the speed up can now remain
increasing when computing cores increase from 1 to 16. However, the maximum
speed up is still only around 6.0 with 16 cores. The new log files can be
found in the attachment.


I checked the configuration of the first server again. This server is a
shared-memory computer, with

Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz

Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the
memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.

It seems that each core can get 2.7GB/s memory bandwidth which can fulfill
the basic requirement for sparse iterative solvers.

Is this correct? Does the shared-memory type of computer have no benefit for
PETSC when the memory bandwidth is limited?


Beside, we would like to continue our work by employing a matrix
partitioning / reordering algorithm, such as Metis or ParMetis, to improve
the speed up performance of the program. (The current program works without
any matrix decomposition.)

Matt, as you said in
a matrix can result in fewer iterations for an iterative solver“.

Do you think the matrix partitioning/reordering will work for this program?
Or any further suggestions?

Any comments are very welcome! Thank you!

On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> On Mon, 20 Dec 2010, Yongjun Chen wrote:
> > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly
> and
> > see what I can get.
> hydra is just the process manager.
> Also --download-mpich uses a slightly older version - with
> device=ch3:sock for portability and valgrind reasons [development]
> You might want to install latest mpich manually with the defaut
> device=ch3:nemsis and recheck..
> satish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/2f3dc444/attachment-0001.htm>
-------------- next part --------------
Process 0 of total 4 on wmss04
Process 2 of total 4 on wmss04
Process 1 of total 4 on wmss04
Process 3 of total 4 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
Begin the solving:										  
The current time is: Wed Dec 22 11:41:09 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

Norm of error 1.28342e-06, Iterations 1473
The solver has finished successfully!			          
The solving time is 420.527 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:48:09 2010

***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Wed Dec 22 12:48:09 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           4.531e+02      1.00000   4.531e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                1.558e+11      1.06872   1.523e+11  6.091e+11
Flops/sec:            3.438e+08      1.06872   3.361e+08  1.344e+09
MPI Messages:         5.906e+03      2.00017   4.430e+03  1.772e+04
MPI Message Lengths:  1.727e+09      2.74432   2.658e+05  4.710e+09
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.5314e+02 100.0%  6.0914e+11 100.0%  1.772e+04 100.0%  2.658e+05      100.0%  4.461e+03  99.6% 

See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.7876e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50  0  39 47 50 50  0  1617
MatMultTranspose    1473 1.0 1.7886e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50  0  39 47 50 50  0  1615
MatAssemblyBegin       1 1.0 3.2670e-0312.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.1171e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.6379e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0934e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2946 1.0 1.9010e+01 2.2 1.73e+09 1.0 0.0e+00 0.0e+00 2.9e+03  3  1  0  0 66   3  1  0  0 66   365
VecNorm             1475 1.0 1.0313e+01 2.8 8.69e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   337
VecCopy                4 1.0 5.2447e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.8803e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.3866e+01 1.5 2.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0   751
VecAYPX             2944 1.0 1.0440e+01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   664
VecAssemblyBegin       6 1.0 1.0071e-0161.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 2.4080e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 1.6040e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   216
VecScatterBegin     2947 1.0 1.7367e+00 2.2 0.00e+00 0.0 1.8e+04 2.7e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 3.0331e+01 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
KSPSetup               1 1.0 1.3974e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 4.0934e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 4.4e+03 90100100100 99  90100100100 99  1488
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 1.6080e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   216

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    169902696     0
                 Vec    18             18     31282096     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       638616     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
Average time to get PetscTime(): 1.19209e-06
Average time for MPI_Barrier(): 5.97954e-05
Average time for zero size MPI_Send(): 2.07424e-05
#PETSc Option Table entries:
-ksp_type bicg
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
-------------- next part --------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
Begin the solving:										  
The current time is: Wed Dec 22 11:12:03 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

Norm of error 1.32502e-06, Iterations 1473
The solver has finished successfully!			          
The solving time is 291.989 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:16:55 2010

***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Wed Dec 22 12:16:55 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.113e+02      1.00000   3.113e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
Flops/sec:            2.503e+08      1.09702   2.446e+08  1.957e+09
MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.1128e+02 100.0%  6.0914e+11 100.0%  4.135e+04 100.0%  2.430e+05      100.0%  4.461e+03  99.6% 

See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.2879e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 36 47 50 50  0  36 47 50 50  0  2244
MatMultTranspose    1473 1.0 1.2240e+02 1.3 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 37 47 50 50  0  37 47 50 50  0  2360
MatAssemblyBegin       1 1.0 3.1061e-03 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.0727e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.2912e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1926e+0113.1 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 6.5343e+0113.5 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03  9  1  0  0 66   9  1  0  0 66   106
VecNorm             1475 1.0 6.9889e+00 3.6 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   497
VecCopy                4 1.0 5.1496e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.2587e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 8.7103e+00 1.5 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1195
VecAYPX             2944 1.0 5.7803e+00 1.4 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1200
VecAssemblyBegin       6 1.0 3.9916e-0214.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 3.6001e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 8.6749e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   400
VecScatterBegin     2947 1.0 1.9621e+00 2.7 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 5.9072e+0110.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 8.9231e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.7991e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 90100100100 99  90100100100 99  2175
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 8.7041e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   399

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     84944064     0
                 Vec    18             18     15741712     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       409008     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
Average time to get PetscTime(): 4.3869e-06
Average time for MPI_Barrier(): 7.25746e-05
Average time for zero size MPI_Send(): 2.06232e-05
#PETSc Option Table entries:
-ksp_type bicg
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
-------------- next part --------------
Process 0 of total 12 on wmss04
Process 2 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 11 of total 12 on wmss04
Process 1Process 3 of total 12 on wmss04
 of total 12 on wmss04
Process 5 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Process 9 of total 12 on wmss04
Process 10 of total 12 on wmss04
Process 7 of total 12 on wmss04
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
Begin the solving:										  
The current time is: Wed Dec 22 12:13:43 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

Norm of error 1.28414e-06, Iterations 1473
The solver has finished successfully!			          
The solving time is 253.909 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 12:17:57 2010

***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Wed Dec 22 13:17:57 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           2.721e+02      1.00000   2.721e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
Flops/sec:            1.910e+08      1.11689   1.865e+08  2.238e+09
MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.7212e+02 100.0%  6.0890e+11 100.0%  6.498e+04 100.0%  2.345e+05      100.0%  4.461e+03  99.6% 

See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.2467e+02 1.6 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 37 47 50 50  0  37 47 50 50  0  2318
MatMultTranspose    1473 1.0 1.0645e+02 1.3 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  2712
MatAssemblyBegin       1 1.0 4.0723e-0274.7 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.3137e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.8801e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.2262e+0190.2 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 6.1395e+0111.5 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03  9  1  0  0 66   9  1  0  0 66   113
VecNorm             1475 1.0 5.8101e+00 3.3 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   598
VecCopy                4 1.0 5.6744e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.1137e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 6.6266e+00 1.4 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1571
VecAYPX             2944 1.0 5.2210e+00 2.3 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1328
VecAssemblyBegin       6 1.0 5.0129e-0218.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 4.7922e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 7.0911e+00 1.6 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   490
VecScatterBegin     2947 1.0 2.5096e+00 3.1 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 4.4540e+01 6.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 7.9119e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.4149e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 89100100100 99  89100100100 99  2521
PCSetUp                1 1.0 6.1989e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 7.1207e+00 1.6 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   488

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     56593044     0
                 Vec    18             18     10534536     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       305424     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
Average time to get PetscTime(): 6.00815e-06
Average time for MPI_Barrier(): 0.000122833
Average time for zero size MPI_Send(): 2.81533e-05
#PETSc Option Table entries:
-ksp_type bicg
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
-------------- next part --------------
Process 3 of total 16 on wmss04
Process 7 of total 16 on wmss04
Process 1 of total 16 on wmss04
Process 15 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 13 of total 16 on wmss04
Process 11 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 0 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 8 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.End Assembly.End Assembly.
End Assembly.

End Assembly.

End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.

End Assembly.
Begin the solving:										  
The current time is: Wed Dec 22 11:23:54 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

Norm of error 1.194e-06, Iterations 1495
The solver has finished successfully!			          
The solving time is 240.208 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:27:54 2010

***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Wed Dec 22 12:27:54 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           2.565e+02      1.00001   2.565e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.959e+10      1.13060   3.859e+10  6.174e+11
Flops/sec:            1.543e+08      1.13060   1.504e+08  2.407e+09
MPI Messages:         1.198e+04      3.99917   7.118e+03  1.139e+05
MPI Message Lengths:  1.948e+09      7.80981   1.819e+05  2.071e+10
MPI Reductions:       4.543e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.5651e+02 100.0%  6.1737e+11 100.0%  1.139e+05 100.0%  1.819e+05      100.0%  4.527e+03  99.6% 

See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

--- Event Stage 0: Main Stage

MatMult             1496 1.0 1.1625e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 38 47 50 50  0  38 47 50 50  0  2520
MatMultTranspose    1495 1.0 9.7790e+01 1.2 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  2994
MatAssemblyBegin       1 1.0 6.3910e-0314.3 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.2797e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 3.0708e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1235e+01111.3 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2990 1.0 5.7054e+0114.6 4.40e+08 1.0 0.0e+00 0.0e+00 3.0e+03  9  1  0  0 66   9  1  0  0 66   123
VecNorm             1497 1.0 5.8130e+00 3.5 2.20e+08 1.0 0.0e+00 0.0e+00 1.5e+03  2  1  0  0 33   2  1  0  0 33   607
VecCopy                4 1.0 3.3658e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8975 1.0 2.5879e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4486 1.0 7.5991e+00 1.6 6.60e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1391
VecAYPX             2988 1.0 4.6226e+00 1.6 4.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1523
VecAssemblyBegin       6 1.0 3.9858e-0213.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 6.6996e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2992 1.0 7.0992e+00 1.5 2.20e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   496
VecScatterBegin     2991 1.0 3.3736e+00 3.7 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2991 1.0 3.3633e+01 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 5.6469e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.2884e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 4.5e+03 89100100100 99  89100100100 99  2697
PCSetUp                1 1.0 5.0068e-06 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2992 1.0 7.1263e+00 1.5 2.20e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   494

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     42424600     0
                 Vec    18             18      7924896     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       247632     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
Average time to get PetscTime(): 8.91685e-06
Average time for MPI_Barrier(): 0.000128984
Average time for zero size MPI_Send(): 1.8239e-05
#PETSc Option Table entries:
-ksp_type bicg
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  

More information about the petsc-users mailing list