[petsc-users] KSPSetUp does not scale

Mon Nov 19 07:40:03 CST 2012

Here are the two files. In this case, maybe you can also give me some 
hints, why the solver at all does not scale here. The solver runtime for 
64 cores is 206 seconds, with the same problem size on 128 cores it 
takes 172 seconds. The number of inner and outer solver iterations are 
the same for both runs. I use CG with jacobi-preconditioner and hypre 
boomeramg for inner solver.

Am 19.11.2012 13:41, schrieb Jed Brown:
> Just have it do one or a few iterations.
>
>
> On Mon, Nov 19, 2012 at 1:36 PM, Thomas Witkowski 
> <thomas.witkowski at tu-dresden.de 
> <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>
>     I can do this! Should I stop the run after KSPSetUp? Or do you
>     want to see the log_summary file from the whole run?
>
>     Thomas
>
>     Am 19.11.2012 13:33, schrieb Jed Brown:
>>     Always, always, always send -log_summary when asking about
>>     performance.
>>
>>
>>     On Mon, Nov 19, 2012 at 11:26 AM, Thomas Witkowski
>>     <thomas.witkowski at tu-dresden.de
>>     <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>>
>>         I have some scaling problem in KSPSetUp, maybe some of you
>>         can help me to fix it. It takes 4.5 seconds on 64 cores, and
>>         4.0 cores on 128 cores. The matrix has around 11 million rows
>>         and is not perfectly balanced, but the number of maximum rows
>>         per core in the 128 cases is exactly halfe of the number in
>>         the case when using 64 cores. Besides the scaling, why does
>>         the setup takes so long? I though that just some objects are
>>         created but no calculation is going on!
>>
>>         The KSPView on the corresponding solver objects is as follows:
>>
>>         KSP Object:(ns_) 64 MPI processes
>>           type: fgmres
>>             GMRES: restart=30, using Classical (unmodified)
>>         Gram-Schmidt Orthogonalization with no iterative refinement
>>             GMRES: happy breakdown tolerance 1e-30
>>           maximum iterations=100, initial guess is zero
>>           tolerances:  relative=1e-06, absolute=1e-08, divergence=10000
>>           right preconditioning
>>           has attached null space
>>           using UNPRECONDITIONED norm type for convergence test
>>         PC Object:(ns_) 64 MPI processes
>>           type: fieldsplit
>>             FieldSplit with Schur preconditioner, factorization FULL
>>             Preconditioner for the Schur complement formed from the
>>         block diagonal part of A11
>>             Split info:
>>             Split number 0 Defined by IS
>>             Split number 1 Defined by IS
>>             KSP solver for A00 block
>>               KSP Object:  (ns_fieldsplit_velocity_)       64 MPI
>>         processes
>>                 type: preonly
>>                 maximum iterations=10000, initial guess is zero
>>                 tolerances:  relative=1e-05, absolute=1e-50,
>>         divergence=10000
>>                 left preconditioning
>>                 using DEFAULT norm type for convergence test
>>               PC Object:  (ns_fieldsplit_velocity_)       64 MPI
>>         processes
>>                 type: none
>>                 linear system matrix = precond matrix:
>>                 Matrix Object:         64 MPI processes
>>                   type: mpiaij
>>                   rows=11068107, cols=11068107
>>                   total: nonzeros=315206535, allocated nonzeros=315206535
>>                   total number of mallocs used during MatSetValues
>>         calls =0
>>                     not using I-node (on process 0) routines
>>             KSP solver for S = A11 - A10 inv(A00) A01
>>               KSP Object:  (ns_fieldsplit_pressure_)       64 MPI
>>         processes
>>                 type: gmres
>>                   GMRES: restart=30, using Classical (unmodified)
>>         Gram-Schmidt Orthogonalization with no iterative refinement
>>                   GMRES: happy breakdown tolerance 1e-30
>>                 maximum iterations=10000, initial guess is zero
>>                 tolerances:  relative=1e-05, absolute=1e-50,
>>         divergence=10000
>>                 left preconditioning
>>                 using DEFAULT norm type for convergence test
>>               PC Object:  (ns_fieldsplit_pressure_)       64 MPI
>>         processes
>>                 type: none
>>                 linear system matrix followed by preconditioner matrix:
>>                 Matrix Object:         64 MPI processes
>>                   type: schurcomplement
>>                   rows=469678, cols=469678
>>                     Schur complement A11 - A10 inv(A00) A01
>>                     A11
>>                       Matrix Object:               64 MPI processes
>>                         type: mpiaij
>>                         rows=469678, cols=469678
>>                         total: nonzeros=0, allocated nonzeros=0
>>                         total number of mallocs used during
>>         MatSetValues calls =0
>>                           using I-node (on process 0) routines: found
>>         1304 nodes, limit used is 5
>>                     A10
>>                       Matrix Object:               64 MPI processes
>>                         type: mpiaij
>>                         rows=469678, cols=11068107
>>                         total: nonzeros=89122957, allocated
>>         nonzeros=89122957
>>                         total number of mallocs used during
>>         MatSetValues calls =0
>>                           not using I-node (on process 0) routines
>>                     KSP of A00
>>                       KSP Object: (ns_fieldsplit_velocity_)          
>>             64 MPI processes
>>                         type: preonly
>>                         maximum iterations=10000, initial guess is zero
>>                         tolerances:  relative=1e-05, absolute=1e-50,
>>         divergence=10000
>>                         left preconditioning
>>                         using DEFAULT norm type for convergence test
>>                       PC Object: (ns_fieldsplit_velocity_)          
>>             64 MPI processes
>>                         type: none
>>                         linear system matrix = precond matrix:
>>                         Matrix Object: 64 MPI processes
>>                           type: mpiaij
>>                           rows=11068107, cols=11068107
>>                           total: nonzeros=315206535, allocated
>>         nonzeros=315206535
>>                           total number of mallocs used during
>>         MatSetValues calls =0
>>                             not using I-node (on process 0) routines
>>                     A01
>>                       Matrix Object:               64 MPI processes
>>                         type: mpiaij
>>                         rows=11068107, cols=469678
>>                         total: nonzeros=88821041, allocated
>>         nonzeros=88821041
>>                         total number of mallocs used during
>>         MatSetValues calls =0
>>                           not using I-node (on process 0) routines
>>                 Matrix Object:         64 MPI processes
>>                   type: mpiaij
>>                   rows=469678, cols=469678
>>                   total: nonzeros=0, allocated nonzeros=0
>>                   total number of mallocs used during MatSetValues
>>         calls =0
>>                     using I-node (on process 0) routines: found 1304
>>         nodes, limit used is 5
>>           linear system matrix = precond matrix:
>>           Matrix Object:   64 MPI processes
>>             type: mpiaij
>>             rows=11537785, cols=11537785
>>             total: nonzeros=493150533, allocated nonzeros=510309207
>>             total number of mallocs used during MatSetValues calls =0
>>               not using I-node (on process 0) routines
>>
>>
>>
>>
>>         Thomas
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121119/dea662cd/attachment-0001.html>
-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ns_diffuse on a arch-linux2-cxx-opt named jj15c52 with 64 processors, by hdr060 Mon Nov 19 14:09:02 2012
Using Petsc Development HG revision: a7fad1b276253926c2a6f6a2ded33d8595ea85e3  HG Date: Tue Oct 30 21:48:02 2012 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           2.646e+02      1.00001   2.646e+02
Objects:              5.000e+02      1.00000   5.000e+02
Flops:                6.177e+09      1.26397   5.525e+09  3.536e+11
Flops/sec:            2.334e+07      1.26397   2.088e+07  1.336e+09
MPI Messages:         1.673e+04      4.59221   8.234e+03  5.270e+05
MPI Message Lengths:  2.267e+08      2.72709   1.745e+04  9.197e+09
MPI Reductions:       1.337e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.6464e+02 100.0%  3.5359e+11 100.0%  5.270e+05 100.0%  1.745e+04      100.0%  1.336e+03  99.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult              707 1.0 8.2787e+00 1.1 4.27e+09 1.3 5.1e+05 1.7e+04 0.0e+00  3 68 97 92  0   3 68 97 92  0 29017
MatConvert             2 1.0 1.6090e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      12 1.0 2.7379e-01 4.4 0.00e+00 0.0 4.3e+03 1.4e+05 1.6e+01  0  0  1  6  1   0  0  1  6  1     0
MatAssemblyEnd        12 1.0 3.3810e-01 1.1 0.00e+00 0.0 1.0e+04 4.0e+03 6.8e+01  0  0  2  0  5   0  0  2  0  5     0
MatGetRowIJ            4 1.0 8.8716e-0435.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot              100 1.0 1.0310e+00 1.8 5.68e+08 1.2 0.0e+00 0.0e+00 1.0e+02  0  9  0  0  7   0  9  0  0  7 32455
VecTDot              300 1.0 2.8301e-02 2.6 5.07e+06 1.4 0.0e+00 0.0e+00 3.0e+02  0  0  0  0 22   0  0  0  0 22  9957
VecNorm              508 1.0 7.5554e-01 3.3 8.64e+07 1.2 0.0e+00 0.0e+00 5.1e+02  0  1  0  0 38   0  1  0  0 38  6726
VecScale             205 1.0 3.1001e-02 1.3 2.14e+07 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 40594
VecCopy              201 1.0 9.6661e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               412 1.0 1.6734e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              401 1.0 1.8162e-01 1.2 8.03e+07 1.2 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 26055
VecAYPX              201 1.0 8.2297e-02 2.1 2.06e+07 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 14731
VecWAXPY               5 1.0 6.2659e-03 2.1 9.79e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  9207
VecMAXPY             201 1.0 1.6730e+00 1.2 1.14e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1 19  0  0  0   1 19  0  0  0 40139
VecAssemblyBegin       4 1.0 6.0473e-0217.9 0.00e+00 0.0 7.1e+02 2.6e+04 1.2e+01  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd         4 1.0 2.9459e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult     300 1.0 6.2079e-03 1.3 2.53e+06 1.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 22697
VecScatterBegin     1107 1.0 4.0446e-01 1.6 0.00e+00 0.0 5.1e+05 1.7e+04 0.0e+00  0  0 97 92  0   0  0 97 92  0     0
VecScatterEnd       1107 1.0 1.6155e+0014.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize           1 1.0 1.5441e-01 1.0 5.87e+05 1.2 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0   224
KSPGMRESOrthog       100 1.0 1.7406e+00 1.3 1.14e+09 1.2 0.0e+00 0.0e+00 1.0e+02  1 19  0  0  7   1 19  0  0  7 38447
KSPSetUp               5 1.0 1.4701e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.0766e+02 1.0 6.16e+09 1.3 5.1e+05 1.7e+04 1.1e+03 78100 97 92 83  78100 97 92 84  1698
PCSetUp                5 1.0 3.2241e+01 1.0 0.00e+00 0.0 4.6e+03 2.6e+04 1.3e+02 12  0  1  1 10  12  0  1  1 10     0
PCApply              100 1.0 1.9821e+02 1.0 7.54e+08 1.4 3.6e+05 8.4e+03 8.1e+02 75 12 69 33 61  75 12 69 33 61   210
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix    29             29    271125592     0
   Matrix Null Space     3              3         1756     0
              Vector   415            408    390453064     0
      Vector Scatter    10             10        10360     0
           Index Set    26             26       425372     0
       Krylov Solver     8              7        25864     0
      Preconditioner     8              7         6192     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.94073e-05
Average time for zero size MPI_Send(): 1.19545e-05
#PETSc Option Table entries:
-log_summary
-ns_ksp_max_it 100
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct 31 15:22:38 2012
Configure options: --known-level1-dcache-size=32768 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=8 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --prefix=/lustre/jhome22/hdr06/hdr060/petsc/install/petsc-dev-opt --with-batch=1 --with-blacs-include=/usr/local/intel/mkl/10.2.5.035/include --with-blacs-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_blacs_intelmpi_lp64 -lmkl_core" --with-blas-lapack-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_intel_lp64 -lmkl_intel_thread  -lmkl_core -liomp5 -lpthread" --with-c++-support --with-cc=mpicc --with-clanguage=cxx --with-cxx=mpicxx --with-debugging=1 --with-fc=mpif90 --with-scalapack-include=/usr/local/intel/mkl/10.2.5.035/include --with-scalapack-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_scalapack_lp64 -lmkl_core" --with-x=0 --known-mpi-shared-libraries=0 --download-hypre --download-metis --download-mumps --download-parmetis --download-superlu --download-superlu_dist --download-umfpack --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Wed Oct 31 15:22:38 2012 on jj28l05 
Machine characteristics: Linux-2.6.32.59-0.3-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev
Using PETSc arch: arch-linux2-cxx-opt
-----------------------------------------

Using C compiler: mpicxx  -wd1572 -O3     ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -O3   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/include -I/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/include
-----------------------------------------

Using C linker: mpicxx
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -L/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -lpetsc -Wl,-rpath,/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -L/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_scalapack_lp64 -lmkl_core -lmkl_blacs_intelmpi_lp64 -lmkl_core -lsuperlu_dist_3.1 -lparmetis -lmetis -lsuperlu_4.3 -lHYPRE -lpthread -lumfpack -lamd -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -L/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/lib -L/opt/parastation/lib64 -L/usr/local/intel/Compiler/11.1/072/lib/intel64 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/x86_64-suse-linux/lib -lmpichf90 -Wl,-rpath,/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/lib -Wl,-rpath,/opt/parastation/lib64 -lifport -lifcore -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lpthread -lpscom -lrt -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl 
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ns_diffuse on a arch-linux2-cxx-opt named jj05c86 with 128 processors, by hdr060 Mon Nov 19 14:03:54 2012
Using Petsc Development HG revision: a7fad1b276253926c2a6f6a2ded33d8595ea85e3  HG Date: Tue Oct 30 21:48:02 2012 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           2.031e+02      1.00001   2.031e+02
Objects:              5.000e+02      1.00000   5.000e+02
Flops:                3.252e+09      1.40504   2.761e+09  3.534e+11
Flops/sec:            1.601e+07      1.40506   1.359e+07  1.739e+09
MPI Messages:         2.435e+04      5.32757   9.772e+03  1.251e+06
MPI Message Lengths:  1.547e+08      2.72759   9.141e+03  1.143e+10
MPI Reductions:       1.374e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.0315e+02 100.0%  3.5336e+11 100.0%  1.251e+06 100.0%  9.141e+03      100.0%  1.373e+03  99.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult              740 1.0 4.1607e+00 1.3 2.25e+09 1.5 1.2e+06 8.7e+03 0.0e+00  2 68 97 92  0   2 68 97 92  0 57719
MatConvert             2 1.0 8.6766e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      12 1.0 2.0412e-01 6.1 0.00e+00 0.0 9.5e+03 7.5e+04 1.6e+01  0  0  1  6  1   0  0  1  6  1     0
MatAssemblyEnd        12 1.0 2.3422e-01 1.1 0.00e+00 0.0 2.3e+04 2.2e+03 6.8e+01  0  0  2  0  5   0  0  2  0  5     0
MatGetRowIJ            4 1.0 1.6210e-03147.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot              100 1.0 6.7637e-01 2.2 2.96e+08 1.3 0.0e+00 0.0e+00 1.0e+02  0  9  0  0  7   0  9  0  0  7 49366
VecTDot              335 1.0 4.2665e-02 2.9 3.05e+06 1.6 0.0e+00 0.0e+00 3.4e+02  0  0  0  0 24   0  0  0  0 24  7360
VecNorm              510 1.0 8.5397e-01 2.7 4.51e+07 1.3 0.0e+00 0.0e+00 5.1e+02  0  1  0  0 37   0  1  0  0 37  5940
VecScale             205 1.0 1.3045e-02 1.4 1.12e+07 1.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 96270
VecCopy              201 1.0 3.9065e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               412 1.0 7.5919e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              405 1.0 6.7209e-02 1.4 4.19e+07 1.3 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 70318
VecAYPX              234 1.0 3.8888e-02 2.6 1.10e+07 1.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 31905
VecWAXPY               5 1.0 2.6548e-03 2.7 5.10e+05 1.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 21685
VecMAXPY             201 1.0 8.4197e-01 1.3 5.93e+08 1.3 0.0e+00 0.0e+00 0.0e+00  0 19  0  0  0   0 19  0  0  0 79587
VecAssemblyBegin       4 1.0 3.6471e-02 5.6 0.00e+00 0.0 1.6e+03 1.5e+04 1.2e+01  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd         4 1.0 1.4598e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult     302 1.0 4.0665e-03 1.6 1.37e+06 1.6 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 34811
VecScatterBegin     1140 1.0 2.3255e-01 1.8 0.00e+00 0.0 1.2e+06 8.7e+03 0.0e+00  0  0 97 92  0   0  0 97 92  0     0
VecScatterEnd       1140 1.0 1.2122e+0014.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize           1 1.0 2.7853e-01 1.0 3.06e+05 1.3 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0   124
KSPGMRESOrthog       100 1.0 1.0198e+00 1.5 5.91e+08 1.3 0.0e+00 0.0e+00 1.0e+02  0 19  0  0  7   0 19  0  0  7 65484
KSPSetUp               5 1.0 1.4089e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 1.7274e+02 1.0 3.24e+09 1.4 1.2e+06 8.7e+03 1.2e+03 85100 97 92 84  85100 97 92 84  2040
PCSetUp                5 1.0 3.0170e+01 1.0 0.00e+00 0.0 1.0e+04 1.2e+04 1.3e+02 15  0  1  1  9  15  0  1  1  9     0
PCApply              100 1.0 1.6804e+02 1.0 4.04e+08 1.5 8.7e+05 4.4e+03 8.5e+02 83 12 70 33 62  83 12 70 33 62   250
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix    29             29    128412168     0
   Matrix Null Space     3              3         1756     0
              Vector   415            408    182178800     0
      Vector Scatter    10             10        10360     0
           Index Set    26             26       266496     0
       Krylov Solver     8              7        25864     0
      Preconditioner     8              7         6192     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 2.90394e-05
Average time for zero size MPI_Send(): 1.24779e-05
#PETSc Option Table entries:
-log_summary
-ns_ksp_max_it 100
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct 31 15:22:38 2012
Configure options: --known-level1-dcache-size=32768 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=8 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --prefix=/lustre/jhome22/hdr06/hdr060/petsc/install/petsc-dev-opt --with-batch=1 --with-blacs-include=/usr/local/intel/mkl/10.2.5.035/include --with-blacs-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_blacs_intelmpi_lp64 -lmkl_core" --with-blas-lapack-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_intel_lp64 -lmkl_intel_thread  -lmkl_core -liomp5 -lpthread" --with-c++-support --with-cc=mpicc --with-clanguage=cxx --with-cxx=mpicxx --with-debugging=1 --with-fc=mpif90 --with-scalapack-include=/usr/local/intel/mkl/10.2.5.035/include --with-scalapack-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_scalapack_lp64 -lmkl_core" --with-x=0 --known-mpi-shared-libraries=0 --download-hypre --download-metis --download-mumps --download-parmetis --download-superlu --download-superlu_dist --download-umfpack --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Wed Oct 31 15:22:38 2012 on jj28l05 
Machine characteristics: Linux-2.6.32.59-0.3-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev
Using PETSc arch: arch-linux2-cxx-opt
-----------------------------------------

Using C compiler: mpicxx  -wd1572 -O3     ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -O3   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/include -I/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/include
-----------------------------------------

Using C linker: mpicxx
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -L/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -lpetsc -Wl,-rpath,/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -L/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_scalapack_lp64 -lmkl_core -lmkl_blacs_intelmpi_lp64 -lmkl_core -lsuperlu_dist_3.1 -lparmetis -lmetis -lsuperlu_4.3 -lHYPRE -lpthread -lumfpack -lamd -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -L/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/lib -L/opt/parastation/lib64 -L/usr/local/intel/Compiler/11.1/072/lib/intel64 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/x86_64-suse-linux/lib -lmpichf90 -Wl,-rpath,/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/lib -Wl,-rpath,/opt/parastation/lib64 -lifport -lifcore -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lpthread -lpscom -lrt -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl 
-----------------------------------------