[petsc-users] KSPSetUp does not scale
Thomas Witkowski
thomas.witkowski at tu-dresden.de
Mon Nov 19 07:40:03 CST 2012
Here are the two files. In this case, maybe you can also give me some
hints, why the solver at all does not scale here. The solver runtime for
64 cores is 206 seconds, with the same problem size on 128 cores it
takes 172 seconds. The number of inner and outer solver iterations are
the same for both runs. I use CG with jacobi-preconditioner and hypre
boomeramg for inner solver.
Am 19.11.2012 13:41, schrieb Jed Brown:
> Just have it do one or a few iterations.
>
>
> On Mon, Nov 19, 2012 at 1:36 PM, Thomas Witkowski
> <thomas.witkowski at tu-dresden.de
> <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>
> I can do this! Should I stop the run after KSPSetUp? Or do you
> want to see the log_summary file from the whole run?
>
> Thomas
>
> Am 19.11.2012 13:33, schrieb Jed Brown:
>> Always, always, always send -log_summary when asking about
>> performance.
>>
>>
>> On Mon, Nov 19, 2012 at 11:26 AM, Thomas Witkowski
>> <thomas.witkowski at tu-dresden.de
>> <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>>
>> I have some scaling problem in KSPSetUp, maybe some of you
>> can help me to fix it. It takes 4.5 seconds on 64 cores, and
>> 4.0 cores on 128 cores. The matrix has around 11 million rows
>> and is not perfectly balanced, but the number of maximum rows
>> per core in the 128 cases is exactly halfe of the number in
>> the case when using 64 cores. Besides the scaling, why does
>> the setup takes so long? I though that just some objects are
>> created but no calculation is going on!
>>
>> The KSPView on the corresponding solver objects is as follows:
>>
>> KSP Object:(ns_) 64 MPI processes
>> type: fgmres
>> GMRES: restart=30, using Classical (unmodified)
>> Gram-Schmidt Orthogonalization with no iterative refinement
>> GMRES: happy breakdown tolerance 1e-30
>> maximum iterations=100, initial guess is zero
>> tolerances: relative=1e-06, absolute=1e-08, divergence=10000
>> right preconditioning
>> has attached null space
>> using UNPRECONDITIONED norm type for convergence test
>> PC Object:(ns_) 64 MPI processes
>> type: fieldsplit
>> FieldSplit with Schur preconditioner, factorization FULL
>> Preconditioner for the Schur complement formed from the
>> block diagonal part of A11
>> Split info:
>> Split number 0 Defined by IS
>> Split number 1 Defined by IS
>> KSP solver for A00 block
>> KSP Object: (ns_fieldsplit_velocity_) 64 MPI
>> processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000
>> left preconditioning
>> using DEFAULT norm type for convergence test
>> PC Object: (ns_fieldsplit_velocity_) 64 MPI
>> processes
>> type: none
>> linear system matrix = precond matrix:
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=11068107, cols=11068107
>> total: nonzeros=315206535, allocated nonzeros=315206535
>> total number of mallocs used during MatSetValues
>> calls =0
>> not using I-node (on process 0) routines
>> KSP solver for S = A11 - A10 inv(A00) A01
>> KSP Object: (ns_fieldsplit_pressure_) 64 MPI
>> processes
>> type: gmres
>> GMRES: restart=30, using Classical (unmodified)
>> Gram-Schmidt Orthogonalization with no iterative refinement
>> GMRES: happy breakdown tolerance 1e-30
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000
>> left preconditioning
>> using DEFAULT norm type for convergence test
>> PC Object: (ns_fieldsplit_pressure_) 64 MPI
>> processes
>> type: none
>> linear system matrix followed by preconditioner matrix:
>> Matrix Object: 64 MPI processes
>> type: schurcomplement
>> rows=469678, cols=469678
>> Schur complement A11 - A10 inv(A00) A01
>> A11
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=469678, cols=469678
>> total: nonzeros=0, allocated nonzeros=0
>> total number of mallocs used during
>> MatSetValues calls =0
>> using I-node (on process 0) routines: found
>> 1304 nodes, limit used is 5
>> A10
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=469678, cols=11068107
>> total: nonzeros=89122957, allocated
>> nonzeros=89122957
>> total number of mallocs used during
>> MatSetValues calls =0
>> not using I-node (on process 0) routines
>> KSP of A00
>> KSP Object: (ns_fieldsplit_velocity_)
>> 64 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000
>> left preconditioning
>> using DEFAULT norm type for convergence test
>> PC Object: (ns_fieldsplit_velocity_)
>> 64 MPI processes
>> type: none
>> linear system matrix = precond matrix:
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=11068107, cols=11068107
>> total: nonzeros=315206535, allocated
>> nonzeros=315206535
>> total number of mallocs used during
>> MatSetValues calls =0
>> not using I-node (on process 0) routines
>> A01
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=11068107, cols=469678
>> total: nonzeros=88821041, allocated
>> nonzeros=88821041
>> total number of mallocs used during
>> MatSetValues calls =0
>> not using I-node (on process 0) routines
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=469678, cols=469678
>> total: nonzeros=0, allocated nonzeros=0
>> total number of mallocs used during MatSetValues
>> calls =0
>> using I-node (on process 0) routines: found 1304
>> nodes, limit used is 5
>> linear system matrix = precond matrix:
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=11537785, cols=11537785
>> total: nonzeros=493150533, allocated nonzeros=510309207
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node (on process 0) routines
>>
>>
>>
>>
>> Thomas
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121119/dea662cd/attachment-0001.html>
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ns_diffuse on a arch-linux2-cxx-opt named jj15c52 with 64 processors, by hdr060 Mon Nov 19 14:09:02 2012
Using Petsc Development HG revision: a7fad1b276253926c2a6f6a2ded33d8595ea85e3 HG Date: Tue Oct 30 21:48:02 2012 -0500
Max Max/Min Avg Total
Time (sec): 2.646e+02 1.00001 2.646e+02
Objects: 5.000e+02 1.00000 5.000e+02
Flops: 6.177e+09 1.26397 5.525e+09 3.536e+11
Flops/sec: 2.334e+07 1.26397 2.088e+07 1.336e+09
MPI Messages: 1.673e+04 4.59221 8.234e+03 5.270e+05
MPI Message Lengths: 2.267e+08 2.72709 1.745e+04 9.197e+09
MPI Reductions: 1.337e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.6464e+02 100.0% 3.5359e+11 100.0% 5.270e+05 100.0% 1.745e+04 100.0% 1.336e+03 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 707 1.0 8.2787e+00 1.1 4.27e+09 1.3 5.1e+05 1.7e+04 0.0e+00 3 68 97 92 0 3 68 97 92 0 29017
MatConvert 2 1.0 1.6090e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 12 1.0 2.7379e-01 4.4 0.00e+00 0.0 4.3e+03 1.4e+05 1.6e+01 0 0 1 6 1 0 0 1 6 1 0
MatAssemblyEnd 12 1.0 3.3810e-01 1.1 0.00e+00 0.0 1.0e+04 4.0e+03 6.8e+01 0 0 2 0 5 0 0 2 0 5 0
MatGetRowIJ 4 1.0 8.8716e-0435.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 100 1.0 1.0310e+00 1.8 5.68e+08 1.2 0.0e+00 0.0e+00 1.0e+02 0 9 0 0 7 0 9 0 0 7 32455
VecTDot 300 1.0 2.8301e-02 2.6 5.07e+06 1.4 0.0e+00 0.0e+00 3.0e+02 0 0 0 0 22 0 0 0 0 22 9957
VecNorm 508 1.0 7.5554e-01 3.3 8.64e+07 1.2 0.0e+00 0.0e+00 5.1e+02 0 1 0 0 38 0 1 0 0 38 6726
VecScale 205 1.0 3.1001e-02 1.3 2.14e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 40594
VecCopy 201 1.0 9.6661e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 412 1.0 1.6734e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 401 1.0 1.8162e-01 1.2 8.03e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 26055
VecAYPX 201 1.0 8.2297e-02 2.1 2.06e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14731
VecWAXPY 5 1.0 6.2659e-03 2.1 9.79e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9207
VecMAXPY 201 1.0 1.6730e+00 1.2 1.14e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 40139
VecAssemblyBegin 4 1.0 6.0473e-0217.9 0.00e+00 0.0 7.1e+02 2.6e+04 1.2e+01 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 4 1.0 2.9459e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 300 1.0 6.2079e-03 1.3 2.53e+06 1.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 22697
VecScatterBegin 1107 1.0 4.0446e-01 1.6 0.00e+00 0.0 5.1e+05 1.7e+04 0.0e+00 0 0 97 92 0 0 0 97 92 0 0
VecScatterEnd 1107 1.0 1.6155e+0014.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 1 1.0 1.5441e-01 1.0 5.87e+05 1.2 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 224
KSPGMRESOrthog 100 1.0 1.7406e+00 1.3 1.14e+09 1.2 0.0e+00 0.0e+00 1.0e+02 1 19 0 0 7 1 19 0 0 7 38447
KSPSetUp 5 1.0 1.4701e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.0766e+02 1.0 6.16e+09 1.3 5.1e+05 1.7e+04 1.1e+03 78100 97 92 83 78100 97 92 84 1698
PCSetUp 5 1.0 3.2241e+01 1.0 0.00e+00 0.0 4.6e+03 2.6e+04 1.3e+02 12 0 1 1 10 12 0 1 1 10 0
PCApply 100 1.0 1.9821e+02 1.0 7.54e+08 1.4 3.6e+05 8.4e+03 8.1e+02 75 12 69 33 61 75 12 69 33 61 210
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 29 29 271125592 0
Matrix Null Space 3 3 1756 0
Vector 415 408 390453064 0
Vector Scatter 10 10 10360 0
Index Set 26 26 425372 0
Krylov Solver 8 7 25864 0
Preconditioner 8 7 6192 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.94073e-05
Average time for zero size MPI_Send(): 1.19545e-05
#PETSc Option Table entries:
-log_summary
-ns_ksp_max_it 100
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct 31 15:22:38 2012
Configure options: --known-level1-dcache-size=32768 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=8 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --prefix=/lustre/jhome22/hdr06/hdr060/petsc/install/petsc-dev-opt --with-batch=1 --with-blacs-include=/usr/local/intel/mkl/10.2.5.035/include --with-blacs-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_blacs_intelmpi_lp64 -lmkl_core" --with-blas-lapack-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread" --with-c++-support --with-cc=mpicc --with-clanguage=cxx --with-cxx=mpicxx --with-debugging=1 --with-fc=mpif90 --with-scalapack-include=/usr/local/intel/mkl/10.2.5.035/include --with-scalapack-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_scalapack_lp64 -lmkl_core" --with-x=0 --known-mpi-shared-libraries=0 --download-hypre --download-metis --download-mumps --download-parmetis --download-superlu --download-superlu_dist --download-umfpack --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Wed Oct 31 15:22:38 2012 on jj28l05
Machine characteristics: Linux-2.6.32.59-0.3-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev
Using PETSc arch: arch-linux2-cxx-opt
-----------------------------------------
Using C compiler: mpicxx -wd1572 -O3 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -O3 ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/include -I/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/include
-----------------------------------------
Using C linker: mpicxx
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -L/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -lpetsc -Wl,-rpath,/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -L/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_scalapack_lp64 -lmkl_core -lmkl_blacs_intelmpi_lp64 -lmkl_core -lsuperlu_dist_3.1 -lparmetis -lmetis -lsuperlu_4.3 -lHYPRE -lpthread -lumfpack -lamd -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -L/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/lib -L/opt/parastation/lib64 -L/usr/local/intel/Compiler/11.1/072/lib/intel64 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/x86_64-suse-linux/lib -lmpichf90 -Wl,-rpath,/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/lib -Wl,-rpath,/opt/parastation/lib64 -lifport -lifcore -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lpthread -lpscom -lrt -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ns_diffuse on a arch-linux2-cxx-opt named jj05c86 with 128 processors, by hdr060 Mon Nov 19 14:03:54 2012
Using Petsc Development HG revision: a7fad1b276253926c2a6f6a2ded33d8595ea85e3 HG Date: Tue Oct 30 21:48:02 2012 -0500
Max Max/Min Avg Total
Time (sec): 2.031e+02 1.00001 2.031e+02
Objects: 5.000e+02 1.00000 5.000e+02
Flops: 3.252e+09 1.40504 2.761e+09 3.534e+11
Flops/sec: 1.601e+07 1.40506 1.359e+07 1.739e+09
MPI Messages: 2.435e+04 5.32757 9.772e+03 1.251e+06
MPI Message Lengths: 1.547e+08 2.72759 9.141e+03 1.143e+10
MPI Reductions: 1.374e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.0315e+02 100.0% 3.5336e+11 100.0% 1.251e+06 100.0% 9.141e+03 100.0% 1.373e+03 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 740 1.0 4.1607e+00 1.3 2.25e+09 1.5 1.2e+06 8.7e+03 0.0e+00 2 68 97 92 0 2 68 97 92 0 57719
MatConvert 2 1.0 8.6766e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 12 1.0 2.0412e-01 6.1 0.00e+00 0.0 9.5e+03 7.5e+04 1.6e+01 0 0 1 6 1 0 0 1 6 1 0
MatAssemblyEnd 12 1.0 2.3422e-01 1.1 0.00e+00 0.0 2.3e+04 2.2e+03 6.8e+01 0 0 2 0 5 0 0 2 0 5 0
MatGetRowIJ 4 1.0 1.6210e-03147.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 100 1.0 6.7637e-01 2.2 2.96e+08 1.3 0.0e+00 0.0e+00 1.0e+02 0 9 0 0 7 0 9 0 0 7 49366
VecTDot 335 1.0 4.2665e-02 2.9 3.05e+06 1.6 0.0e+00 0.0e+00 3.4e+02 0 0 0 0 24 0 0 0 0 24 7360
VecNorm 510 1.0 8.5397e-01 2.7 4.51e+07 1.3 0.0e+00 0.0e+00 5.1e+02 0 1 0 0 37 0 1 0 0 37 5940
VecScale 205 1.0 1.3045e-02 1.4 1.12e+07 1.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 96270
VecCopy 201 1.0 3.9065e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 412 1.0 7.5919e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 405 1.0 6.7209e-02 1.4 4.19e+07 1.3 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 70318
VecAYPX 234 1.0 3.8888e-02 2.6 1.10e+07 1.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 31905
VecWAXPY 5 1.0 2.6548e-03 2.7 5.10e+05 1.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21685
VecMAXPY 201 1.0 8.4197e-01 1.3 5.93e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 19 0 0 0 0 19 0 0 0 79587
VecAssemblyBegin 4 1.0 3.6471e-02 5.6 0.00e+00 0.0 1.6e+03 1.5e+04 1.2e+01 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 4 1.0 1.4598e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 302 1.0 4.0665e-03 1.6 1.37e+06 1.6 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 34811
VecScatterBegin 1140 1.0 2.3255e-01 1.8 0.00e+00 0.0 1.2e+06 8.7e+03 0.0e+00 0 0 97 92 0 0 0 97 92 0 0
VecScatterEnd 1140 1.0 1.2122e+0014.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 1 1.0 2.7853e-01 1.0 3.06e+05 1.3 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 124
KSPGMRESOrthog 100 1.0 1.0198e+00 1.5 5.91e+08 1.3 0.0e+00 0.0e+00 1.0e+02 0 19 0 0 7 0 19 0 0 7 65484
KSPSetUp 5 1.0 1.4089e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 1.7274e+02 1.0 3.24e+09 1.4 1.2e+06 8.7e+03 1.2e+03 85100 97 92 84 85100 97 92 84 2040
PCSetUp 5 1.0 3.0170e+01 1.0 0.00e+00 0.0 1.0e+04 1.2e+04 1.3e+02 15 0 1 1 9 15 0 1 1 9 0
PCApply 100 1.0 1.6804e+02 1.0 4.04e+08 1.5 8.7e+05 4.4e+03 8.5e+02 83 12 70 33 62 83 12 70 33 62 250
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 29 29 128412168 0
Matrix Null Space 3 3 1756 0
Vector 415 408 182178800 0
Vector Scatter 10 10 10360 0
Index Set 26 26 266496 0
Krylov Solver 8 7 25864 0
Preconditioner 8 7 6192 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 2.90394e-05
Average time for zero size MPI_Send(): 1.24779e-05
#PETSc Option Table entries:
-log_summary
-ns_ksp_max_it 100
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct 31 15:22:38 2012
Configure options: --known-level1-dcache-size=32768 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=8 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --prefix=/lustre/jhome22/hdr06/hdr060/petsc/install/petsc-dev-opt --with-batch=1 --with-blacs-include=/usr/local/intel/mkl/10.2.5.035/include --with-blacs-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_blacs_intelmpi_lp64 -lmkl_core" --with-blas-lapack-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread" --with-c++-support --with-cc=mpicc --with-clanguage=cxx --with-cxx=mpicxx --with-debugging=1 --with-fc=mpif90 --with-scalapack-include=/usr/local/intel/mkl/10.2.5.035/include --with-scalapack-lib="-L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_scalapack_lp64 -lmkl_core" --with-x=0 --known-mpi-shared-libraries=0 --download-hypre --download-metis --download-mumps --download-parmetis --download-superlu --download-superlu_dist --download-umfpack --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Wed Oct 31 15:22:38 2012 on jj28l05
Machine characteristics: Linux-2.6.32.59-0.3-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev
Using PETSc arch: arch-linux2-cxx-opt
-----------------------------------------
Using C compiler: mpicxx -wd1572 -O3 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -O3 ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/include -I/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/include -I/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/include
-----------------------------------------
Using C linker: mpicxx
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -L/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -lpetsc -Wl,-rpath,/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -L/lustre/jhome22/hdr06/hdr060/petsc/build/petsc-dev/arch-linux2-cxx-opt/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -L/usr/local/intel/mkl/10.2.5.035/lib/em64t -lmkl_scalapack_lp64 -lmkl_core -lmkl_blacs_intelmpi_lp64 -lmkl_core -lsuperlu_dist_3.1 -lparmetis -lmetis -lsuperlu_4.3 -lHYPRE -lpthread -lumfpack -lamd -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -L/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/lib -L/opt/parastation/lib64 -L/usr/local/intel/Compiler/11.1/072/lib/intel64 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/x86_64-suse-linux/lib -lmpichf90 -Wl,-rpath,/lustre/jsoft/usr_local/parastation/mpi2-intel-5.0.26-1/lib -Wl,-rpath,/opt/parastation/lib64 -lifport -lifcore -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lpthread -lpscom -lrt -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
More information about the petsc-users
mailing list