[petsc-users] GAMG speed
Michele Rosso
mrosso at uci.edu
Tue Aug 13 21:57:36 CDT 2013
Hi Jed,
I attached the output for both the runs you suggested. At the beginning
of each file I included the options I used.
On a side note, I tried to run with a grid of 256^3 (exactly as before)
but with less levels, i.e. 3 instead of 4 or 5.
My system stops the run because of an Out Of Memory condition. It is
really odd since I have not changed anything except
- pc_mg_levels. I cannot send you any output since there is none. Do
you have any guess where the problem comes from?
Thanks,
Michele
On 08/13/2013 07:23 PM, Jed Brown wrote:
> Michele Rosso <mrosso at uci.edu> writes:
>> The matrix arises from discretization of the Poisson equation in
>> incompressible flow calculations.
> Can you try the two runs below and send -log_summary?
>
> -log_summary -ksp_monitor -ksp_view -ksp_converged_reason -pc_type mg -pc_mg_galerkin -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1
>
>
> -log_summary -ksp_monitor -ksp_view -ksp_converged_reason -pc_type mg -pc_mg_galerkin -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -pc_mg_type full
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130813/72be2884/attachment-0001.html>
-------------- next part --------------
-log_summary -ksp_monitor -ksp_view -ksp_converged_reason -pc_type mg
-pc_mg_galerkin -pc_mg_levels 5 -mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 1
0 KSP Residual norm 3.653965664551e-05
1 KSP Residual norm 1.910638846094e-06
2 KSP Residual norm 8.690440116045e-08
3 KSP Residual norm 3.732213639394e-09
4 KSP Residual norm 1.964855338020e-10
Linear solve converged due to CONVERGED_RTOL iterations 4
KSP Object: 8 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=0.0001, absolute=1e-50, divergence=10000
left preconditioning
has attached null space
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8 MPI processes
type: redundant
Redundant preconditioner: First (color=0) of 8 PCs follows
KSP Object: (mg_coarse_redundant_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_redundant_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot
matrix ordering: nd
factor fill ratio given 5, needed 8.69546
Factored matrix follows:
Matrix Object: 1 MPI processes
type: seqaij
rows=512, cols=512
package used to perform factorization: petsc
total: nonzeros=120206, allocated nonzeros=120206
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Matrix Object: 1 MPI processes
type: seqaij
rows=512, cols=512
total: nonzeros=13824, allocated nonzeros=13824
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=512, cols=512
total: nonzeros=13824, allocated nonzeros=13824
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 32 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=4096, cols=4096
total: nonzeros=110592, allocated nonzeros=110592
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=32768, cols=32768
total: nonzeros=884736, allocated nonzeros=884736
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=262144, cols=262144
total: nonzeros=7077888, allocated nonzeros=7077888
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=2097152, cols=2097152
total: nonzeros=14680064, allocated nonzeros=14680064
total number of mallocs used during MatSetValues calls =0
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=2097152, cols=2097152
total: nonzeros=14680064, allocated nonzeros=14680064
total number of mallocs used during MatSetValues calls =0
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./hit on a arch-cray-xt5-pkgs-opt named nid13790 with 8 processors, by Unknown Tue Aug 13 22:37:31 2013
Using Petsc Release Version 3.4.2, Jul, 02, 2013
Max Max/Min Avg Total
Time (sec): 4.048e+00 1.00012 4.048e+00
Objects: 2.490e+02 1.00000 2.490e+02
Flops: 2.663e+08 1.00000 2.663e+08 2.130e+09
Flops/sec: 6.579e+07 1.00012 6.579e+07 5.263e+08
MPI Messages: 6.820e+02 1.00000 6.820e+02 5.456e+03
MPI Message Lengths: 8.245e+06 1.00000 1.209e+04 6.596e+07
MPI Reductions: 4.580e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.0480e+00 100.0% 2.1305e+09 100.0% 5.456e+03 100.0% 1.209e+04 100.0% 4.570e+02 99.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 12 1.0 2.9428e-02 1.2 6.29e+06 1.0 0.0e+00 0.0e+00 1.2e+01 1 2 0 0 3 1 2 0 0 3 1710
VecNorm 9 1.0 1.0796e-02 1.2 4.72e+06 1.0 0.0e+00 0.0e+00 9.0e+00 0 2 0 0 2 0 2 0 0 2 3497
VecScale 24 1.0 2.4652e-04 1.1 1.99e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6442
VecCopy 3 1.0 5.0740e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 116 1.0 1.4349e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 12 1.0 2.8027e-02 1.0 6.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1796
VecAYPX 29 1.0 3.0655e-02 1.4 4.16e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1085
VecScatterBegin 123 1.0 3.5391e-02 1.1 0.00e+00 0.0 3.5e+03 1.2e+04 0.0e+00 1 0 65 66 0 1 0 65 66 0 0
VecScatterEnd 123 1.0 2.5395e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 31 1.0 2.3556e-01 1.0 5.62e+07 1.0 1.0e+03 2.3e+04 0.0e+00 6 21 19 36 0 6 21 19 36 0 1908
MatMultAdd 24 1.0 5.9044e-02 1.0 1.21e+07 1.0 5.8e+02 2.8e+03 0.0e+00 1 5 11 2 0 1 5 11 2 0 1644
MatMultTranspose 28 1.0 7.4601e-02 1.1 1.42e+07 1.0 6.7e+02 2.8e+03 0.0e+00 2 5 12 3 0 2 5 12 3 0 1518
MatSolve 6 1.0 3.8311e-03 1.0 1.44e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 3006
MatSOR 48 1.0 5.8050e-01 1.0 1.01e+08 1.0 8.6e+02 1.5e+04 4.8e+01 14 38 16 19 10 14 38 16 19 11 1390
MatLUFactorSym 1 1.0 3.0620e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 1 0 0 0 0 1 0
MatLUFactorNum 1 1.0 2.4665e-02 1.0 1.95e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 7 0 0 0 1 7 0 0 0 6329
MatAssemblyBegin 20 1.0 2.4351e-02 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01 0 0 0 0 5 0 0 0 0 5 0
MatAssemblyEnd 20 1.0 1.3176e-01 1.0 0.00e+00 0.0 5.6e+02 2.1e+03 7.2e+01 3 0 10 2 16 3 0 10 2 16 0
MatGetRowIJ 1 1.0 1.1516e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 4.1008e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 16 1.3 1.0209e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 3 0
MatPtAP 4 1.0 6.4001e-01 1.0 4.06e+07 1.0 1.1e+03 1.7e+04 1.0e+02 16 15 21 30 22 16 15 21 30 22 507
MatPtAPSymbolic 4 1.0 3.7003e-01 1.0 0.00e+00 0.0 7.2e+02 2.0e+04 6.0e+01 9 0 13 22 13 9 0 13 22 13 0
MatPtAPNumeric 4 1.0 2.7004e-01 1.0 4.06e+07 1.0 4.2e+02 1.2e+04 4.0e+01 7 15 8 8 9 7 15 8 8 9 1202
MatGetRedundant 1 1.0 7.9393e-04 1.0 0.00e+00 0.0 1.7e+02 7.1e+03 4.0e+00 0 0 3 2 1 0 0 3 2 1 0
MatGetLocalMat 4 1.0 3.9521e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 1 0 0 0 2 1 0 0 0 2 0
MatGetBrAoCol 4 1.0 1.7719e-02 1.0 0.00e+00 0.0 4.3e+02 2.7e+04 8.0e+00 0 0 8 18 2 0 0 8 18 2 0
MatGetSymTrans 8 1.0 1.3007e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 7 1.0 1.3097e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 5 0 0 0 0 5 0
KSPSolve 2 1.0 1.0450e+00 1.0 2.04e+08 1.0 3.4e+03 1.2e+04 7.5e+01 26 77 62 60 16 26 77 62 60 16 1563
PCSetUp 1 1.0 8.6248e-01 1.0 6.21e+07 1.0 1.9e+03 1.1e+04 3.2e+02 21 23 35 32 69 21 23 35 32 69 576
PCApply 6 1.0 8.4384e-01 1.0 1.61e+08 1.0 3.2e+03 9.0e+03 4.8e+01 21 60 59 44 10 21 60 59 44 11 1523
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 1 1 564 0
Vector 99 99 47537368 0
Vector Scatter 21 21 22092 0
Matrix 37 37 75834272 0
Matrix Null Space 1 1 596 0
Distributed Mesh 5 5 2740736 0
Bipartite Graph 10 10 7920 0
Index Set 50 50 1546832 0
IS L to G Mapping 5 5 1361108 0
Krylov Solver 7 7 8616 0
DMKSP interface 3 3 1944 0
Preconditioner 7 7 6672 0
Viewer 3 2 1456 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 2.43187e-06
Average time for zero size MPI_Send(): 2.38419e-06
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_view
-log_summary
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Jul 31 22:48:06 2013
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=0 --known-mpi-c-double-complex=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --with-debugging=0 --COPTFLAGS="-fastsse -Mipa=fast -mp" --CXXOPTFLAGS="-fastsse -Mipa=fast -mp" --FOPTFLAGS="-fastsse -Mipa=fast -mp" --with-blas-lapack-lib="-L/opt/acml/4.4.0/pgi64/lib -lacml -lacml_mv" --with-shared-libraries=0 --with-x=0 --with-batch --known-mpi-shared-libraries=0 PETSC_ARCH=arch-cray-xt5-pkgs-opt
-----------------------------------------
Libraries compiled on Wed Jul 31 22:48:06 2013 on krakenpf1
Machine characteristics: Linux-2.6.27.48-0.12.1_1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /nics/c/home/mrosso/LIBS/petsc-3.4.2
Using PETSc arch: arch-cray-xt5-pkgs-opt
-----------------------------------------
Using C compiler: cc -fastsse -Mipa=fast -mp ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -fastsse -Mipa=fast -mp ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/include -I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include -I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include -I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include -I/opt/acml/4.4.0/pgi64/include -I/opt/xt-libsci/11.0.04/pgi/109/istanbul/include -I/opt/fftw/3.3.0.0/x86_64/include -I/usr/include/alps
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/lib -L/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/lib -lpetsc -L/opt/acml/4.4.0/pgi64/lib -lacml -lacml_mv -lpthread -ldl
-----------------------------------------
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_view
-log_summary
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
There are no unused options.
-------------- next part --------------
-log_summary -ksp_monitor -ksp_view -ksp_converged_reason -pc_type mg
-pc_mg_galerkin -pc_mg_levels 5 -mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 1 -pc_mg_type full
0 KSP Residual norm 3.654533581988e-05
1 KSP Residual norm 8.730776244351e-07
2 KSP Residual norm 3.474626061661e-08
3 KSP Residual norm 1.813665557493e-09
Linear solve converged due to CONVERGED_RTOL iterations 3
KSP Object: 8 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=0.0001, absolute=1e-50, divergence=10000
left preconditioning
has attached null space
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: mg
MG: type is FULL, levels=5 cycles=v
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8 MPI processes
type: redundant
Redundant preconditioner: First (color=0) of 8 PCs follows
KSP Object: (mg_coarse_redundant_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_redundant_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot
matrix ordering: nd
factor fill ratio given 5, needed 8.69546
Factored matrix follows:
Matrix Object: 1 MPI processes
type: seqaij
rows=512, cols=512
package used to perform factorization: petsc
total: nonzeros=120206, allocated nonzeros=120206
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Matrix Object: 1 MPI processes
type: seqaij
rows=512, cols=512
total: nonzeros=13824, allocated nonzeros=13824
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=512, cols=512
total: nonzeros=13824, allocated nonzeros=13824
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 32 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=4096, cols=4096
total: nonzeros=110592, allocated nonzeros=110592
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=32768, cols=32768
total: nonzeros=884736, allocated nonzeros=884736
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=262144, cols=262144
total: nonzeros=7077888, allocated nonzeros=7077888
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=2097152, cols=2097152
total: nonzeros=14680064, allocated nonzeros=14680064
total number of mallocs used during MatSetValues calls =0
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Matrix Object: 8 MPI processes
type: mpiaij
rows=2097152, cols=2097152
total: nonzeros=14680064, allocated nonzeros=14680064
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./hit on a arch-cray-xt5-pkgs-opt named nid14615 with 8 processors, by Unknown Tue Aug 13 22:44:16 2013
Using Petsc Release Version 3.4.2, Jul, 02, 2013
Max Max/Min Avg Total
Time (sec): 4.261e+00 1.00012 4.261e+00
Objects: 2.950e+02 1.00000 2.950e+02
Flops: 3.322e+08 1.00000 3.322e+08 2.658e+09
Flops/sec: 7.797e+07 1.00012 7.796e+07 6.237e+08
MPI Messages: 1.442e+03 1.00000 1.442e+03 1.154e+04
MPI Message Lengths: 1.018e+07 1.00000 7.057e+03 8.141e+07
MPI Reductions: 5.460e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.2609e+00 100.0% 2.6575e+09 100.0% 1.154e+04 100.0% 7.057e+03 100.0% 5.450e+02 99.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 10 1.0 2.4743e-02 1.1 5.24e+06 1.0 0.0e+00 0.0e+00 1.0e+01 1 2 0 0 2 1 2 0 0 2 1695
VecNorm 8 1.0 9.9294e-03 1.3 4.19e+06 1.0 0.0e+00 0.0e+00 8.0e+00 0 1 0 0 1 0 1 0 0 1 3379
VecScale 70 1.0 4.9663e-04 1.1 3.86e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6222
VecCopy 3 1.0 5.0108e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 271 1.0 1.0437e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 10 1.0 2.3400e-02 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1792
VecAYPX 54 1.0 2.5038e-02 1.5 3.55e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1133
VecScatterBegin 324 1.0 4.1335e-02 1.1 0.00e+00 0.0 9.6e+03 6.1e+03 0.0e+00 1 0 83 72 0 1 0 83 72 0 0
VecScatterEnd 324 1.0 4.4111e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatMult 76 1.0 2.8557e-01 1.1 6.73e+07 1.0 2.5e+03 9.8e+03 0.0e+00 6 20 22 31 0 6 20 22 31 0 1884
MatMultAdd 50 1.0 5.5734e-02 1.0 1.15e+07 1.0 1.2e+03 1.5e+03 0.0e+00 1 3 10 2 0 1 3 10 2 0 1657
MatMultTranspose 74 1.0 1.2116e-01 1.2 2.37e+07 1.0 1.8e+03 1.9e+03 0.0e+00 3 7 15 4 0 3 7 15 4 0 1563
MatSolve 25 1.0 1.3877e-02 1.0 6.00e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 3458
MatSOR 100 1.0 7.1429e-01 1.1 1.45e+08 1.0 2.6e+03 9.4e+03 1.4e+02 16 44 23 30 26 16 44 23 30 26 1628
MatLUFactorSym 1 1.0 3.0639e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 1 0 0 0 0 1 0
MatLUFactorNum 1 1.0 2.4523e-02 1.0 1.95e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 6366
MatAssemblyBegin 20 1.0 3.1168e-02 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01 0 0 0 0 4 0 0 0 0 4 0
MatAssemblyEnd 20 1.0 1.3784e-01 1.1 0.00e+00 0.0 5.6e+02 2.1e+03 7.2e+01 3 0 5 1 13 3 0 5 1 13 0
MatGetRowIJ 1 1.0 1.1015e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 4.0793e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 16 1.3 1.0140e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 2 0 0 0 0 2 0
MatPtAP 4 1.0 6.4115e-01 1.0 4.06e+07 1.0 1.1e+03 1.7e+04 1.0e+02 15 12 10 24 18 15 12 10 24 18 506
MatPtAPSymbolic 4 1.0 3.7106e-01 1.0 0.00e+00 0.0 7.2e+02 2.0e+04 6.0e+01 9 0 6 18 11 9 0 6 18 11 0
MatPtAPNumeric 4 1.0 2.7011e-01 1.0 4.06e+07 1.0 4.2e+02 1.2e+04 4.0e+01 6 12 4 6 7 6 12 4 6 7 1202
MatGetRedundant 1 1.0 8.1611e-04 1.0 0.00e+00 0.0 1.7e+02 7.1e+03 4.0e+00 0 0 1 1 1 0 0 1 1 1 0
MatGetLocalMat 4 1.0 3.9911e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 1 0 0 0 1 1 0 0 0 1 0
MatGetBrAoCol 4 1.0 1.7765e-02 1.0 0.00e+00 0.0 4.3e+02 2.7e+04 8.0e+00 0 0 4 14 1 0 0 4 14 1 0
MatGetSymTrans 8 1.0 1.3194e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 7 1.0 1.4666e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 4 0 0 0 0 4 0
KSPSolve 2 1.0 1.2287e+00 1.0 2.70e+08 1.0 9.5e+03 5.8e+03 1.6e+02 29 81 82 68 30 29 81 82 68 30 1758
PCSetUp 1 1.0 8.6414e-01 1.0 6.21e+07 1.0 1.9e+03 1.1e+04 3.2e+02 20 19 17 26 58 20 19 17 26 58 575
PCApply 5 1.0 1.0571e+00 1.0 2.33e+08 1.0 9.3e+03 4.9e+03 1.4e+02 24 70 81 56 26 24 70 81 56 26 1764
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 1 1 564 0
Vector 145 145 58892872 0
Vector Scatter 21 21 22092 0
Matrix 37 37 75834272 0
Matrix Null Space 1 1 596 0
Distributed Mesh 5 5 2740736 0
Bipartite Graph 10 10 7920 0
Index Set 50 50 1546832 0
IS L to G Mapping 5 5 1361108 0
Krylov Solver 7 7 8616 0
DMKSP interface 3 3 1944 0
Preconditioner 7 7 6672 0
Viewer 3 2 1456 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 6.58035e-06
Average time for zero size MPI_Send(): 4.02331e-06
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_view
-log_summary
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_mg_type full
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Jul 31 22:48:06 2013
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=0 --known-mpi-c-double-complex=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --with-debugging=0 --COPTFLAGS="-fastsse -Mipa=fast -mp" --CXXOPTFLAGS="-fastsse -Mipa=fast -mp" --FOPTFLAGS="-fastsse -Mipa=fast -mp" --with-blas-lapack-lib="-L/opt/acml/4.4.0/pgi64/lib -lacml -lacml_mv" --with-shared-libraries=0 --with-x=0 --with-batch --known-mpi-shared-libraries=0 PETSC_ARCH=arch-cray-xt5-pkgs-opt
-----------------------------------------
Libraries compiled on Wed Jul 31 22:48:06 2013 on krakenpf1
Machine characteristics: Linux-2.6.27.48-0.12.1_1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /nics/c/home/mrosso/LIBS/petsc-3.4.2
Using PETSc arch: arch-cray-xt5-pkgs-opt
-----------------------------------------
Using C compiler: cc -fastsse -Mipa=fast -mp ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -fastsse -Mipa=fast -mp ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/include -I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include -I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include -I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include -I/opt/acml/4.4.0/pgi64/include -I/opt/xt-libsci/11.0.04/pgi/109/istanbul/include -I/opt/fftw/3.3.0.0/x86_64/include -I/usr/include/alps
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/lib -L/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/lib -lpetsc -L/opt/acml/4.4.0/pgi64/lib -lacml -lacml_mv -lpthread -ldl
-----------------------------------------
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_view
-log_summary
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_mg_type full
-pc_type mg
#End of PETSc Option Table entries
There are no unused options.
Application 6640063 resources: utime ~45s, stime ~2s
More information about the petsc-users
mailing list