[petsc-users] Very poor speed up performance

Yongjun Chen yjxd.chen at gmail.com
Mon Dec 20 12:38:31 CST 2010


Hi Matt,

Thanks for your reply. Just now I have carried out a series of tests with
k=2, 4, 8, 12 and 16 cores on the first server again with the -log_summary
option. From 8 cores to 12 cores, a small speed up has been found this time,
but from 12 cores to 16 cores, the computation time increase!
Attached please find these 5 log files. Thank you very much!

mpiexec -n *k* ./AMG_Solver_MPI -pc_type jacobi -ksp_type bicg -log_summary
Here, I use ksp bicg instead of gmres, because the two ksp gives almost the
same speed up performance, as I have tried many times.
----------------------
(1) k=2
----------------------
Process 1 of total 2 on wmss04
Process 0 of total 2 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Mon Dec 20 17:42:23 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.25862e-06
Norm of error 1.25862e-06, Iterations 1475
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 762.874 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 17:55:06 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny Mon
Dec 20 18:55:06 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total
Time (sec):           8.160e+02      1.00000   8.160e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.120e+11      1.04720   3.050e+11  6.100e+11
Flops/sec:            3.824e+08      1.04720   3.737e+08  7.475e+08
MPI Messages:         2.958e+03      1.00068   2.958e+03  5.915e+03
MPI Message Lengths:  9.598e+08      1.00034   3.245e+05  1.919e+09
MPI Reductions:       4.483e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 8.1603e+02 100.0%  6.0997e+11 100.0%  5.915e+03 100.0%
3.245e+05      100.0%  4.467e+03  99.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05
0.0e+00 41 47 50 50  0  41 47 50 50  0   846
MatMultTranspose    1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05
0.0e+00 42 47 50 50  0  42 47 50 50  0   846
MatAssemblyBegin       1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00
3.0e+03  2  1  0  0 66   2  1  0  0 66   340
VecNorm             1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00
1.5e+03  1  1  0  0 33   1  1  0  0 33   287
VecCopy                4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  2  0  0  0   2  2  0  0  0   566
VecAYPX             2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   510
VecAssemblyBegin       6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   194
VecScatterBegin     2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
KSPSetup               1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05
4.4e+03 92100100100 99  92100100100 99   811
PCSetUp                1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   193
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    339744648     0
                 Vec    18             18     62239872     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       974736     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.21593e-06
Average time for MPI_Barrier(): 1.44005e-05
Average time for zero size MPI_Send(): 1.94311e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
--with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
--download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
--download-parmetis=1 --download-mumps=1 --download-scalapack=1
--download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
--known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6
12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler:
/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -O
Using Fortran compiler:
/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall
-Wno-unused-variable -O
-----------------------------------------
Using include paths:
-I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include
-I/sun42/cheny/petsc-3.1-p5-optimized/include
-I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include
------------------------------------------
Using C linker:
/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -O
Using Fortran linker:
/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall
-Wno-unused-variable -O
Using libraries:
-Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
-L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc
-Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
-L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx
-lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord
-lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt
-L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
-L/usr/lib64/gcc/x86_64-suse-linux/4.1.2
-L/opt/intel/Compiler/11.0/083/ipp/em64t/lib
-L/opt/intel/Compiler/11.0/083/mkl/lib/em64t
-L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib
-L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90
-lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich
-lpthread -lrt -lgcc_s -ldl
------------------------------------------


----------------------
(2) k=4
----------------------
Process 0 of total 4 on wmss04
Process 2 of total 4 on wmss04
Process 3 of total 4 on wmss04
Process 1 of total 4 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Mon Dec 20 17:33:24 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28342e-06
Norm of error 1.28342e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 450.583 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 17:40:55 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Mon
Dec 20 18:40:55 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total
Time (sec):           4.807e+02      1.00000   4.807e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                1.558e+11      1.06872   1.523e+11  6.091e+11
Flops/sec:            3.241e+08      1.06872   3.168e+08  1.267e+09
MPI Messages:         5.906e+03      2.00017   4.430e+03  1.772e+04
MPI Message Lengths:  1.727e+09      2.74432   2.658e+05  4.710e+09
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 4.8066e+02 100.0%  6.0914e+11 100.0%  1.772e+04 100.0%
2.658e+05      100.0%  4.461e+03  99.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05
0.0e+00 39 47 50 50  0  39 47 50 50  0  1494
MatMultTranspose    1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05
0.0e+00 40 47 50 50  0  40 47 50 50  0  1498
MatAssemblyBegin       1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00
2.9e+03  3  1  0  0 66   3  1  0  0 66   274
VecNorm             1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00
1.5e+03  1  1  0  0 33   1  1  0  0 33   310
VecCopy                4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00
0.0e+00  3  2  0  0  0   3  2  0  0  0   732
VecAYPX             2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   610
VecAssemblyBegin       6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00
1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00
0.0e+00  3  1  0  0  0   3  1  0  0  0   202
VecScatterBegin     2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  6  0  0  0  0   6  0  0  0  0     0
KSPSetup               1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05
4.4e+03 91100100100 99  91100100100 99  1386
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00
0.0e+00  3  1  0  0  0   3  1  0  0  0   201
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    169902696     0
                 Vec    18             18     31282096     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       638616     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.5974e-06
Average time for MPI_Barrier(): 3.48091e-05
Average time for zero size MPI_Send(): 1.8537e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
--with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
--download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
--download-parmetis=1 --download-mumps=1 --download-scalapack=1
--download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
--known-mpi-shared=1
-----------------------------------------



----------------------
(3) k=8
----------------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Mon Dec 20 18:14:59 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.32502e-06
Norm of error 1.32502e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 311.937 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:20:11 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Mon
Dec 20 19:20:11 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total
Time (sec):           3.330e+02      1.00000   3.330e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
Flops/sec:            2.340e+08      1.09702   2.286e+08  1.829e+09
MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 3.3302e+02 100.0%  6.0914e+11 100.0%  4.135e+04 100.0%
2.430e+05      100.0%  4.461e+03  99.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05
0.0e+00 38 47 50 50  0  38 47 50 50  0  2031
MatMultTranspose    1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05
0.0e+00 38 47 50 50  0  38 47 50 50  0  2120
MatAssemblyBegin       1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00
2.9e+03  6  1  0  0 66   6  1  0  0 66   194
VecNorm             1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00
1.5e+03  1  1  0  0 33   1  1  0  0 33   428
VecCopy                4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  2  0  0  0   2  2  0  0  0  1127
VecAYPX             2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0  1015
VecAssemblyBegin       6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00
0.0e+00  3  1  0  0  0   3  1  0  0  0   359
VecScatterBegin     2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05
0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05
4.4e+03 90100100100 99  90100100100 99  2024
PCSetUp                1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00
0.0e+00  3  1  0  0  0   3  1  0  0  0   358
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     84944064     0
                 Vec    18             18     15741712     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       409008     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 3.38554e-06
Average time for MPI_Barrier(): 7.40051e-05
Average time for zero size MPI_Send(): 1.88947e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
--with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
--download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
--download-parmetis=1 --download-mumps=1 --download-scalapack=1
--download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
--known-mpi-shared=1
-----------------------------------------



----------------------
(4) k=12
----------------------
Process 1 of total 12 on wmss04
Process 5 of total 12 on wmss04
Process 2 of total 12 on wmss04
Process 9 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 7 of total 12 on wmss04
Process 10 of total 12 on wmss04
Process 3 of total 12 on wmss04
Process 11 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 0 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.
End Assembly.

End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Mon Dec 20 17:56:36 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28414e-06
Norm of error 1.28414e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 291.503 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:01:28 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny
Mon Dec 20 19:01:28 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total
Time (sec):           3.089e+02      1.00012   3.089e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
Flops/sec:            1.683e+08      1.11689   1.643e+08  1.971e+09
MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 3.0887e+02 100.0%  6.0890e+11 100.0%  6.498e+04 100.0%
2.345e+05      100.0%  4.461e+03  99.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05
0.0e+00 35 47 50 50  0  35 47 50 50  0  2054
MatMultTranspose    1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05
0.0e+00 34 47 50 50  0  34 47 50 50  0  2175
MatAssemblyBegin       1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00
2.9e+03 13  1  0  0 66  13  1  0  0 66    60
VecNorm             1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00
1.5e+03  2  1  0  0 33   2  1  0  0 33   322
VecCopy                4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  2  0  0  0   2  2  0  0  0   964
VecAYPX             2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0  1041
VecAssemblyBegin       6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   395
VecScatterBegin     2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05
0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 12  0  0  0  0  12  0  0  0  0     0
KSPSetup               1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05
4.4e+03 91100100100 99  91100100100 99  2173
PCSetUp                1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   393
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     56593044     0
                 Vec    18             18     10534536     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       305424     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 6.48499e-06
Average time for MPI_Barrier(): 0.000102377
Average time for zero size MPI_Send(): 2.15967e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
--with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
--download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
--download-parmetis=1 --download-mumps=1 --download-scalapack=1
--download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
--known-mpi-shared=1
-----------------------------------------


----------------------
(5) k=16
----------------------
Process 0 of total 16 on wmss04
Process 8 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 11 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 7 of total 16 on wmss04
Process Process 15 of total 16 on wmss04
3Process 13 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 1 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
 of total 16 on wmss04

Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.End Assembly.
End Assembly.End Assembly.End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.

End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.



=========================================================
Begin the solving:
=========================================================
The current time is: Mon Dec 20 18:02:28 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.15892e-06
Norm of error 1.15892e-06, Iterations 1497
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 337.91 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:08:06 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny
Mon Dec 20 19:08:06 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total
Time (sec):           3.534e+02      1.00001   3.534e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.964e+10      1.13060   3.864e+10  6.182e+11
Flops/sec:            1.122e+08      1.13060   1.093e+08  1.749e+09
MPI Messages:         1.200e+04      3.99917   7.127e+03  1.140e+05
MPI Message Lengths:  1.950e+09      7.80999   1.819e+05  2.074e+10
MPI Reductions:       4.549e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 3.5342e+02 100.0%  6.1820e+11 100.0%  1.140e+05 100.0%
1.819e+05      100.0%  4.533e+03  99.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05
0.0e+00 40 47 50 50  0  40 47 50 50  0  1555
MatMultTranspose    1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05
0.0e+00 35 47 50 50  0  35 47 50 50  0  2069
MatAssemblyBegin       1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00
3.0e+03 10  1  0  0 66  10  1  0  0 66   104
VecNorm             1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00
1.5e+03  2  1  0  0 33   2  1  0  0 33   263
VecCopy                4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  2  0  0  0   2  2  0  0  0   931
VecAYPX             2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00
0.0e+00  1  1  0  0  0   1  1  0  0  0   962
VecAssemblyBegin       6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   360
VecScatterBegin     2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05
0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 22  0  0  0  0  22  0  0  0  0     0
KSPSetup               1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05
4.5e+03 92100100100 99  92100100100 99  1893
PCSetUp                1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   359
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     42424600     0
                 Vec    18             18      7924896     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       247632     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 6.10352e-06
Average time for MPI_Barrier(): 0.000129986
Average time for zero size MPI_Send(): 2.08169e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
--with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
--download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
--download-parmetis=1 --download-mumps=1 --download-scalapack=1
--download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
--known-mpi-shared=1
-----------------------------------------




On Mon, Dec 20, 2010 at 6:06 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen <yjxd.chen at gmail.com> wrote:
>
>>
>> Hi everyone,
>>
>>
>> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A
>> and right hand vector b are read from files. The dimension of A is
>> 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been
>> read correctly.
>>
>> I compiled the program with optimized version (--with-debugging=0), tested
>> the speed up performance on two servers, and I have found that the
>> performance is very poor.
>>
>> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total
>> 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48
>> cores.
>>
>> On each of them, with the increasing of computing cores k from 1 to 8
>> (mpiexec –n  k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up
>> will increase from 1 to 6, but when the computing cores k increase from 9 to
>> 16(for the first server) or 48 (for the second server), the speed up
>> decrease firstly and then remains a constant value 5.0 (for the first
>> server) or 4.5(for the second server).
>>
>
> We cannot say anything at all without -log_summary data for your runs.
>
>    Matt
>
>
>>  Actually, the program LAMMPS speed up excellently on these two servers.
>>
>> Any comments are very appreciated! Thanks!
>>
>>
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------
>>
>> PS: the related codes are as following,
>>
>>
>> //firstly read A and b from files
>>
>> ...
>>
>> //then
>>
>>
>>
>>               ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
>> CHKERRQ(ierr);
>>
>>               ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>>
>>               ierr = VecAssemblyBegin(b); CHKERRQ(ierr);
>>
>>               ierr = VecAssemblyEnd(b); CHKERRQ(ierr);
>>
>>
>>
>>               ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE);
>> CHKERRQ(ierr);
>>
>>               ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr);
>>
>>               ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>>
>>
>>
>>               ierr =
>> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
>>
>>               ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
>>
>>               ierr =
>> KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);
>>
>>               ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>
>>
>>
>>               ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>>
>>
>>
>>               ierr =
>> KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
>>
>>
>>
>>               ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr);
>>
>>
>>
>>               ierr = VecAssemblyBegin(x);CHKERRQ(ierr);
>>
>>               ierr = VecAssemblyEnd(x);CHKERRQ(ierr);
>>
>> ...
>>
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>



-- 
Dr.Yongjun Chen
Room 2507, Building M
Institute of Materials Science and Technology
Technical University of Hamburg-Harburg
Eißendorfer Straße 42, 21073 Hamburg, Germany.
Tel:  +49 (0)40-42878-4386
Fax: +49 (0)40-42878-4070
E-mail: yjxd.chen at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101220/57181b30/attachment-0001.htm>
-------------- next part --------------
Process 0 of total 16 on wmss04
Process 8 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 11 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 7 of total 16 on wmss04
Process Process 15 of total 16 on wmss04
3Process 13 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 1 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
 of total 16 on wmss04

Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.End Assembly.
End Assembly.End Assembly.End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.

End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.



=========================================================
Begin the solving:										  
=========================================================
The current time is: Mon Dec 20 18:02:28 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.15892e-06
Norm of error 1.15892e-06, Iterations 1497
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 337.91 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:08:06 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Mon Dec 20 19:08:06 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.534e+02      1.00001   3.534e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.964e+10      1.13060   3.864e+10  6.182e+11
Flops/sec:            1.122e+08      1.13060   1.093e+08  1.749e+09
MPI Messages:         1.200e+04      3.99917   7.127e+03  1.140e+05
MPI Message Lengths:  1.950e+09      7.80999   1.819e+05  2.074e+10
MPI Reductions:       4.549e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.5342e+02 100.0%  6.1820e+11 100.0%  1.140e+05 100.0%  1.819e+05      100.0%  4.533e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 40 47 50 50  0  40 47 50 50  0  1555
MatMultTranspose    1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  2069
MatAssemblyBegin       1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00 3.0e+03 10  1  0  0 66  10  1  0  0 66   104
VecNorm             1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00 1.5e+03  2  1  0  0 33   2  1  0  0 33   263
VecCopy                4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   931
VecAYPX             2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   962
VecAssemblyBegin       6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   360
VecScatterBegin     2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 22  0  0  0  0  22  0  0  0  0     0
KSPSetup               1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 4.5e+03 92100100100 99  92100100100 99  1893
PCSetUp                1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   359
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     42424600     0
                 Vec    18             18      7924896     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       247632     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 6.10352e-06
Average time for MPI_Barrier(): 0.000129986
Average time for zero size MPI_Send(): 2.08169e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 1 of total 2 on wmss04
Process 0 of total 2 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Mon Dec 20 17:42:23 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.25862e-06
Norm of error 1.25862e-06, Iterations 1475
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 762.874 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 17:55:06 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny Mon Dec 20 18:55:06 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           8.160e+02      1.00000   8.160e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.120e+11      1.04720   3.050e+11  6.100e+11
Flops/sec:            3.824e+08      1.04720   3.737e+08  7.475e+08
MPI Messages:         2.958e+03      1.00068   2.958e+03  5.915e+03
MPI Message Lengths:  9.598e+08      1.00034   3.245e+05  1.919e+09
MPI Reductions:       4.483e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 8.1603e+02 100.0%  6.0997e+11 100.0%  5.915e+03 100.0%  3.245e+05      100.0%  4.467e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 0.0e+00 41 47 50 50  0  41 47 50 50  0   846
MatMultTranspose    1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 0.0e+00 42 47 50 50  0  42 47 50 50  0   846
MatAssemblyBegin       1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00 3.0e+03  2  1  0  0 66   2  1  0  0 66   340
VecNorm             1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   287
VecCopy                4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   566
VecAYPX             2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   510
VecAssemblyBegin       6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   194
VecScatterBegin     2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
KSPSetup               1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05 4.4e+03 92100100100 99  92100100100 99   811
PCSetUp                1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   193
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    339744648     0
                 Vec    18             18     62239872     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       974736     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.21593e-06
Average time for MPI_Barrier(): 1.44005e-05
Average time for zero size MPI_Send(): 1.94311e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 4 on wmss04
Process 2 of total 4 on wmss04
Process 3 of total 4 on wmss04
Process 1 of total 4 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Mon Dec 20 17:33:24 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28342e-06
Norm of error 1.28342e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 450.583 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 17:40:55 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Mon Dec 20 18:40:55 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           4.807e+02      1.00000   4.807e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                1.558e+11      1.06872   1.523e+11  6.091e+11
Flops/sec:            3.241e+08      1.06872   3.168e+08  1.267e+09
MPI Messages:         5.906e+03      2.00017   4.430e+03  1.772e+04
MPI Message Lengths:  1.727e+09      2.74432   2.658e+05  4.710e+09
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.8066e+02 100.0%  6.0914e+11 100.0%  1.772e+04 100.0%  2.658e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50  0  39 47 50 50  0  1494
MatMultTranspose    1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 40 47 50 50  0  40 47 50 50  0  1498
MatAssemblyBegin       1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00 2.9e+03  3  1  0  0 66   3  1  0  0 66   274
VecNorm             1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   310
VecCopy                4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0   732
VecAYPX             2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   610
VecAssemblyBegin       6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   202
VecScatterBegin     2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  6  0  0  0  0   6  0  0  0  0     0
KSPSetup               1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 4.4e+03 91100100100 99  91100100100 99  1386
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   201
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    169902696     0
                 Vec    18             18     31282096     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       638616     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.5974e-06
Average time for MPI_Barrier(): 3.48091e-05
Average time for zero size MPI_Send(): 1.8537e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Mon Dec 20 18:14:59 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.32502e-06
Norm of error 1.32502e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 311.937 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:20:11 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Mon Dec 20 19:20:11 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.330e+02      1.00000   3.330e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
Flops/sec:            2.340e+08      1.09702   2.286e+08  1.829e+09
MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.3302e+02 100.0%  6.0914e+11 100.0%  4.135e+04 100.0%  2.430e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 38 47 50 50  0  38 47 50 50  0  2031
MatMultTranspose    1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 38 47 50 50  0  38 47 50 50  0  2120
MatAssemblyBegin       1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03  6  1  0  0 66   6  1  0  0 66   194
VecNorm             1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   428
VecCopy                4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1127
VecAYPX             2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1015
VecAssemblyBegin       6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   359
VecScatterBegin     2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 90100100100 99  90100100100 99  2024
PCSetUp                1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   358
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     84944064     0
                 Vec    18             18     15741712     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       409008     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 3.38554e-06
Average time for MPI_Barrier(): 7.40051e-05
Average time for zero size MPI_Send(): 1.88947e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 1 of total 12 on wmss04
Process 5 of total 12 on wmss04
Process 2 of total 12 on wmss04
Process 9 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 7 of total 12 on wmss04
Process 10 of total 12 on wmss04
Process 3 of total 12 on wmss04
Process 11 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 0 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.
End Assembly.

End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Mon Dec 20 17:56:36 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28414e-06
Norm of error 1.28414e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 291.503 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:01:28 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Mon Dec 20 19:01:28 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.089e+02      1.00012   3.089e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
Flops/sec:            1.683e+08      1.11689   1.643e+08  1.971e+09
MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.0887e+02 100.0%  6.0890e+11 100.0%  6.498e+04 100.0%  2.345e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  2054
MatMultTranspose    1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 34 47 50 50  0  34 47 50 50  0  2175
MatAssemblyBegin       1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03 13  1  0  0 66  13  1  0  0 66    60
VecNorm             1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03  2  1  0  0 33   2  1  0  0 33   322
VecCopy                4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   964
VecAYPX             2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1041
VecAssemblyBegin       6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   395
VecScatterBegin     2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  12  0  0  0  0     0
KSPSetup               1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 91100100100 99  91100100100 99  2173
PCSetUp                1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   393
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     56593044     0
                 Vec    18             18     10534536     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       305424     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 6.48499e-06
Average time for MPI_Barrier(): 0.000102377
Average time for zero size MPI_Send(): 2.15967e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------


More information about the petsc-users mailing list