[petsc-users] How to speed up geometric multigrid
Michele Rosso
mrosso at uci.edu
Wed Oct 2 13:10:32 CDT 2013
Thank you all for your contribution.
So far the fastest solution is still the initial one proposed by Jed in
an earlier round:
-ksp_atol 1e-9 -ksp_monitor_true_residual -ksp_view -log_summary
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu -mg_levels_ksp_max_it 3 -mg_levels_ksp_type
richardson -options_left -pc_mg_galerkin
-pc_mg_levels 5 -pc_mg_log -pc_type mg
where I used -mg_levels_ksp_max_it 3 as Barry suggested instead of
-mg_levels_ksp_max_it 1.
I attached the diagnostics for this case. Any further idea?
Thank you,
Michele
On 10/01/2013 11:44 PM, Barry Smith wrote:
> On Oct 2, 2013, at 12:28 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
>> "Mark F. Adams" <mfadams at lbl.gov> writes:
>>> run3.txt uses:
>>>
>>> -ksp_type richardson
>>>
>>> This is bad and I doubt anyone recommended it intentionally.
> Hell this is normal multigrid without a Krylov accelerator. Under normal circumstances with geometric multigrid this should be fine, often the best choice.
>
>> I would have expected FGMRES, but Barry likes Krylov smoothers and
>> Richardson is one of a few methods that can tolerate nonlinear
>> preconditioners.
>>
>>> You also have, in this file,
>>>
>>> -mg_levels_ksp_type gmres
>>>
>>> did you or the recommenders mean
>>>
>>> -mg_levels_ksp_type richardson ???
>>>
>>> you are using gmres here, which forces you to use fgmres in the outer solver. This is a safe thing to use you if you apply your BCa symmetrically with a low order discretization then
>>>
>>> -ksp_type cg
>>> -mg_levels_ksp_type richardson
>>> -mg_levels_pc_type sor
>>>
>>> is what I'd recommend.
>> I thought that was tried in an earlier round.
>>
>> I don't understand why SOR preconditioning in the Krylov smoother is so
>> drastically more expensive than BJacobi/ILU and why SOR is called so
>> many more times even though the number of outer iterations
>>
>> bjacobi: PCApply 322 1.0 4.1021e+01 1.0 6.44e+09 1.0 3.0e+07 1.6e+03 4.5e+04 74 86 98 88 92 28160064317351226 20106
>> bjacobi: KSPSolve 46 1.0 4.6268e+01 1.0 7.52e+09 1.0 3.0e+07 1.8e+03 4.8e+04 83100100 99 99 31670065158291309 20800
>>
>> sor: PCApply 1132 1.0 1.5532e+02 1.0 2.30e+10 1.0 1.0e+08 1.6e+03 1.6e+05 69 88 99 88 93 21871774317301274 18987
>> sor: KSPSolve 201 1.0 1.7101e+02 1.0 2.63e+10 1.0 1.1e+08 1.8e+03 1.7e+05 75100100 99 98 24081775248221352 19652
>
-------------- next part --------------
OPTIONS USED:
-ksp_atol 1e-9
-ksp_monitor_true_residual
-ksp_view
-log_summary
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu
-mg_levels_ksp_max_it 3
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_mg_log
-pc_type mg
0 KSP unpreconditioned resid norm 1.062110078078e-06 true resid norm 1.062110078078e-06 ||r(i)||/||b|| 1.000000000000e+00
1 KSP unpreconditioned resid norm 2.236451946298e-07 true resid norm 2.236451946298e-07 ||r(i)||/||b|| 2.105668698998e-01
2 KSP unpreconditioned resid norm 1.220833343347e-07 true resid norm 1.220833343347e-07 ||r(i)||/||b|| 1.149441445426e-01
3 KSP unpreconditioned resid norm 8.872102504003e-08 true resid norm 8.872102504003e-08 ||r(i)||/||b|| 8.353279652576e-02
4 KSP unpreconditioned resid norm 6.619711859327e-08 true resid norm 6.619711859327e-08 ||r(i)||/||b|| 6.232604318479e-02
5 KSP unpreconditioned resid norm 5.271840072756e-08 true resid norm 5.271840072756e-08 ||r(i)||/||b|| 4.963553384502e-02
6 KSP unpreconditioned resid norm 4.229219228929e-08 true resid norm 4.229219228929e-08 ||r(i)||/||b|| 3.981902927221e-02
7 KSP unpreconditioned resid norm 3.634611900984e-08 true resid norm 3.634611900984e-08 ||r(i)||/||b|| 3.422067049361e-02
8 KSP unpreconditioned resid norm 3.131796519117e-08 true resid norm 3.131796519117e-08 ||r(i)||/||b|| 2.948655307729e-02
9 KSP unpreconditioned resid norm 2.653911712772e-08 true resid norm 2.653911712772e-08 ||r(i)||/||b|| 2.498716251309e-02
10 KSP unpreconditioned resid norm 2.451248019105e-08 true resid norm 2.451248019105e-08 ||r(i)||/||b|| 2.307903926061e-02
11 KSP unpreconditioned resid norm 2.373835128699e-08 true resid norm 2.373835128699e-08 ||r(i)||/||b|| 2.235017987020e-02
12 KSP unpreconditioned resid norm 2.476463969748e-08 true resid norm 2.476463969748e-08 ||r(i)||/||b|| 2.331645298226e-02
13 KSP unpreconditioned resid norm 2.689648556959e-08 true resid norm 2.689648556959e-08 ||r(i)||/||b|| 2.532363276156e-02
14 KSP unpreconditioned resid norm 3.098937516975e-08 true resid norm 3.098937516975e-08 ||r(i)||/||b|| 2.917717834468e-02
15 KSP unpreconditioned resid norm 3.555038182753e-08 true resid norm 3.555038182753e-08 ||r(i)||/||b|| 3.347146643395e-02
16 KSP unpreconditioned resid norm 4.112024511716e-08 true resid norm 4.112024511716e-08 ||r(i)||/||b|| 3.871561523226e-02
17 KSP unpreconditioned resid norm 4.629240103388e-08 true resid norm 4.629240103388e-08 ||r(i)||/||b|| 4.358531379127e-02
18 KSP unpreconditioned resid norm 4.985610207288e-08 true resid norm 4.985610207288e-08 ||r(i)||/||b|| 4.694061670436e-02
19 KSP unpreconditioned resid norm 5.046376291690e-08 true resid norm 5.046376291690e-08 ||r(i)||/||b|| 4.751274275472e-02
20 KSP unpreconditioned resid norm 5.025357388082e-08 true resid norm 5.025357388082e-08 ||r(i)||/||b|| 4.731484515407e-02
21 KSP unpreconditioned resid norm 4.733061043311e-08 true resid norm 4.733061043311e-08 ||r(i)||/||b|| 4.456281077642e-02
22 KSP unpreconditioned resid norm 4.482409805557e-08 true resid norm 4.482409805557e-08 ||r(i)||/||b|| 4.220287424134e-02
23 KSP unpreconditioned resid norm 4.070552710576e-08 true resid norm 4.070552710576e-08 ||r(i)||/||b|| 3.832514910266e-02
24 KSP unpreconditioned resid norm 3.746139586173e-08 true resid norm 3.746139586173e-08 ||r(i)||/||b|| 3.527072818058e-02
25 KSP unpreconditioned resid norm 3.416470090249e-08 true resid norm 3.416470090249e-08 ||r(i)||/||b|| 3.216681736447e-02
26 KSP unpreconditioned resid norm 3.162747159737e-08 true resid norm 3.162747159737e-08 ||r(i)||/||b|| 2.977796016644e-02
27 KSP unpreconditioned resid norm 2.886965691540e-08 true resid norm 2.886965691540e-08 ||r(i)||/||b|| 2.718141698425e-02
28 KSP unpreconditioned resid norm 2.669294602696e-08 true resid norm 2.669294602696e-08 ||r(i)||/||b|| 2.513199580525e-02
29 KSP unpreconditioned resid norm 2.477496636609e-08 true resid norm 2.477496636609e-08 ||r(i)||/||b|| 2.332617576789e-02
30 KSP unpreconditioned resid norm 2.254756345946e-08 true resid norm 2.254756345946e-08 ||r(i)||/||b|| 2.122902693878e-02
31 KSP unpreconditioned resid norm 2.100745862543e-08 true resid norm 2.100745862543e-08 ||r(i)||/||b|| 1.977898436239e-02
32 KSP unpreconditioned resid norm 2.082372673705e-08 true resid norm 2.082372673705e-08 ||r(i)||/||b|| 1.960599674823e-02
33 KSP unpreconditioned resid norm 2.058561394284e-08 true resid norm 2.058561394284e-08 ||r(i)||/||b|| 1.938180831510e-02
34 KSP unpreconditioned resid norm 2.071527481693e-08 true resid norm 2.071527481693e-08 ||r(i)||/||b|| 1.950388688000e-02
35 KSP unpreconditioned resid norm 2.100892944872e-08 true resid norm 2.100892944872e-08 ||r(i)||/||b|| 1.978036917487e-02
36 KSP unpreconditioned resid norm 2.220101872142e-08 true resid norm 2.220101872142e-08 ||r(i)||/||b|| 2.090274744554e-02
37 KSP unpreconditioned resid norm 2.324772438230e-08 true resid norm 2.324772438230e-08 ||r(i)||/||b|| 2.188824384792e-02
38 KSP unpreconditioned resid norm 2.452302256995e-08 true resid norm 2.452302256995e-08 ||r(i)||/||b|| 2.308896514224e-02
39 KSP unpreconditioned resid norm 2.502647686575e-08 true resid norm 2.502647686575e-08 ||r(i)||/||b|| 2.356297843539e-02
40 KSP unpreconditioned resid norm 2.531223073672e-08 true resid norm 2.531223073672e-08 ||r(i)||/||b|| 2.383202199016e-02
41 KSP unpreconditioned resid norm 2.499727165695e-08 true resid norm 2.499727165695e-08 ||r(i)||/||b|| 2.353548108892e-02
42 KSP unpreconditioned resid norm 2.462083389942e-08 true resid norm 2.462083389942e-08 ||r(i)||/||b|| 2.318105666033e-02
43 KSP unpreconditioned resid norm 2.360189108305e-08 true resid norm 2.360189108305e-08 ||r(i)||/||b|| 2.222169958670e-02
44 KSP unpreconditioned resid norm 2.252988454814e-08 true resid norm 2.252988454814e-08 ||r(i)||/||b|| 2.121238185492e-02
45 KSP unpreconditioned resid norm 2.188564712770e-08 true resid norm 2.188564712770e-08 ||r(i)||/||b|| 2.060581815334e-02
46 KSP unpreconditioned resid norm 2.002949813700e-08 true resid norm 2.002949813700e-08 ||r(i)||/||b|| 1.885821305193e-02
47 KSP unpreconditioned resid norm 1.822159592332e-08 true resid norm 1.822159592332e-08 ||r(i)||/||b|| 1.715603335231e-02
48 KSP unpreconditioned resid norm 1.731437653543e-08 true resid norm 1.731437653543e-08 ||r(i)||/||b|| 1.630186634399e-02
49 KSP unpreconditioned resid norm 1.582438316044e-08 true resid norm 1.582438316044e-08 ||r(i)||/||b|| 1.489900480850e-02
50 KSP unpreconditioned resid norm 1.470070282545e-08 true resid norm 1.470070282545e-08 ||r(i)||/||b|| 1.384103505736e-02
51 KSP unpreconditioned resid norm 1.317055921275e-08 true resid norm 1.317055921275e-08 ||r(i)||/||b|| 1.240037118995e-02
52 KSP unpreconditioned resid norm 1.200360805809e-08 true resid norm 1.200360805809e-08 ||r(i)||/||b|| 1.130166101033e-02
53 KSP unpreconditioned resid norm 1.035246990182e-08 true resid norm 1.035246990182e-08 ||r(i)||/||b|| 9.747078118834e-03
54 KSP unpreconditioned resid norm 9.012810502968e-09 true resid norm 9.012810502968e-09 ||r(i)||/||b|| 8.485759328525e-03
55 KSP unpreconditioned resid norm 8.556164955549e-09 true resid norm 8.556164955549e-09 ||r(i)||/||b|| 8.055817501548e-03
56 KSP unpreconditioned resid norm 7.776893147540e-09 true resid norm 7.776893147540e-09 ||r(i)||/||b|| 7.322115953947e-03
57 KSP unpreconditioned resid norm 6.867595067138e-09 true resid norm 6.867595067138e-09 ||r(i)||/||b|| 6.465991810912e-03
58 KSP unpreconditioned resid norm 6.256223035332e-09 true resid norm 6.256223035332e-09 ||r(i)||/||b|| 5.890371595621e-03
59 KSP unpreconditioned resid norm 5.775805121780e-09 true resid norm 5.775805121780e-09 ||r(i)||/||b|| 5.438047563048e-03
60 KSP unpreconditioned resid norm 5.028152348022e-09 true resid norm 5.028152348022e-09 ||r(i)||/||b|| 4.734116031666e-03
61 KSP unpreconditioned resid norm 4.491271029703e-09 true resid norm 4.491271029703e-09 ||r(i)||/||b|| 4.228630461573e-03
62 KSP unpreconditioned resid norm 4.194174911407e-09 true resid norm 4.194174911407e-09 ||r(i)||/||b|| 3.948907931462e-03
63 KSP unpreconditioned resid norm 3.900672763613e-09 true resid norm 3.900672763613e-09 ||r(i)||/||b|| 3.672569203630e-03
64 KSP unpreconditioned resid norm 3.725382861224e-09 true resid norm 3.725382861224e-09 ||r(i)||/||b|| 3.507529904967e-03
65 KSP unpreconditioned resid norm 3.470705216044e-09 true resid norm 3.470705216044e-09 ||r(i)||/||b|| 3.267745300304e-03
66 KSP unpreconditioned resid norm 3.190845546802e-09 true resid norm 3.190845546802e-09 ||r(i)||/||b|| 3.004251266100e-03
67 KSP unpreconditioned resid norm 2.936936118052e-09 true resid norm 2.936936118052e-09 ||r(i)||/||b|| 2.765189954103e-03
68 KSP unpreconditioned resid norm 2.807750828309e-09 true resid norm 2.807750828309e-09 ||r(i)||/||b|| 2.643559162334e-03
69 KSP unpreconditioned resid norm 2.630235180177e-09 true resid norm 2.630235180177e-09 ||r(i)||/||b|| 2.476424275098e-03
70 KSP unpreconditioned resid norm 2.423253188367e-09 true resid norm 2.423253188367e-09 ||r(i)||/||b|| 2.281546177166e-03
71 KSP unpreconditioned resid norm 2.312671011482e-09 true resid norm 2.312671011482e-09 ||r(i)||/||b|| 2.177430625334e-03
72 KSP unpreconditioned resid norm 2.135449041972e-09 true resid norm 2.135449041972e-09 ||r(i)||/||b|| 2.010572242980e-03
73 KSP unpreconditioned resid norm 2.002324106483e-09 true resid norm 2.002324106483e-09 ||r(i)||/||b|| 1.885232188086e-03
74 KSP unpreconditioned resid norm 1.778111616174e-09 true resid norm 1.778111616174e-09 ||r(i)||/||b|| 1.674131196827e-03
75 KSP unpreconditioned resid norm 1.653921088947e-09 true resid norm 1.653921088947e-09 ||r(i)||/||b|| 1.557203083827e-03
76 KSP unpreconditioned resid norm 1.536016641258e-09 true resid norm 1.536016641258e-09 ||r(i)||/||b|| 1.446193452978e-03
77 KSP unpreconditioned resid norm 1.456376200968e-09 true resid norm 1.456376200968e-09 ||r(i)||/||b|| 1.371210226725e-03
78 KSP unpreconditioned resid norm 1.301938916885e-09 true resid norm 1.301938916885e-09 ||r(i)||/||b|| 1.225804126858e-03
79 KSP unpreconditioned resid norm 1.256867113940e-09 true resid norm 1.256867113940e-09 ||r(i)||/||b|| 1.183368033015e-03
80 KSP unpreconditioned resid norm 1.084746612787e-09 true resid norm 1.084746612787e-09 ||r(i)||/||b|| 1.021312795328e-03
81 KSP unpreconditioned resid norm 1.026849960395e-09 true resid norm 1.026849960395e-09 ||r(i)||/||b|| 9.668018236432e-04
82 KSP unpreconditioned resid norm 9.283375662057e-10 true resid norm 9.283375662057e-10 ||r(i)||/||b|| 8.740502376984e-04
KSP Object: 128 MPI processes
type: cg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-09, divergence=10000
left preconditioning
has attached null space
using UNPRECONDITIONED norm type for convergence test
PC Object: 128 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 128 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 128 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0, needed 0
Factored matrix follows:
Matrix Object: 128 MPI processes
type: mpiaij
rows=1024, cols=1024
package used to perform factorization: superlu_dist
total: nonzeros=0, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
SuperLU_DIST run parameters:
Process grid nprow 16 x npcol 8
Equilibrate matrix TRUE
Matrix input mode 1
Replace tiny pivots TRUE
Use iterative refinement FALSE
Processors in row 16 col partition 8
Row permutation LargeDiag
Column permutation METIS_AT_PLUS_A
Parallel symbolic factorization FALSE
Repeated factorization SamePattern_SameRowPerm
linear system matrix = precond matrix:
Matrix Object: 128 MPI processes
type: mpiaij
rows=1024, cols=1024
total: nonzeros=27648, allocated nonzeros=27648
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 128 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=3
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 128 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 128 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=221184, allocated nonzeros=221184
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 16 nodes, limit used is 5
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 128 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=3
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 128 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 128 MPI processes
type: mpiaij
rows=65536, cols=65536
total: nonzeros=1769472, allocated nonzeros=1769472
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 128 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=3
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 128 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 128 MPI processes
type: mpiaij
rows=524288, cols=524288
total: nonzeros=14155776, allocated nonzeros=14155776
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 128 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=3
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 128 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Matrix Object: 128 MPI processes
type: mpiaij
rows=4194304, cols=4194304
total: nonzeros=29360128, allocated nonzeros=29360128
total number of mallocs used during MatSetValues calls =0
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Matrix Object: 128 MPI processes
type: mpiaij
rows=4194304, cols=4194304
total: nonzeros=29360128, allocated nonzeros=29360128
total number of mallocs used during MatSetValues calls =0
---------------------------------------- SUMMARY ----------------------------------------
Setup time = 0.0118 min
Initialization time = 0.0001 min
Processing time = 39.5579 min
Post-processing time = 0.0028 min
Total simulation time = 39.5726 min
Processing time per time step = 2.8391 sec
Total number of time steps = 836
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./hit on a interlagos-64idx-pgi-opt named nid20962 with 128 processors, by Unknown Wed Oct 2 12:24:10 2013
Using Petsc Release Version 3.4.2, Jul, 02, 2013
Max Max/Min Avg Total
Time (sec): 2.407e+03 1.00001 2.407e+03
Objects: 3.145e+05 1.00000 3.145e+05
Flops: 3.135e+11 1.00000 3.135e+11 4.012e+13
Flops/sec: 1.303e+08 1.00001 1.303e+08 1.667e+10
MPI Messages: 6.225e+06 1.00000 6.225e+06 7.968e+08
MPI Message Lengths: 2.183e+10 1.00000 3.506e+03 2.794e+12
MPI Reductions: 7.943e+05 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 6.2536e+02 26.0% 5.4104e+12 13.5% 3.273e+07 4.1% 6.724e+02 19.2% 2.969e+05 37.4%
1: MG Apply: 1.7813e+03 74.0% 3.4714e+13 86.5% 7.641e+08 95.9% 2.834e+03 80.8% 4.975e+05 62.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 62182 1.0 2.3234e+01 1.6 4.08e+09 1.0 0.0e+00 0.0e+00 6.2e+04 1 1 0 0 8 3 10 0 0 21 22451
VecNorm 64690 1.0 1.9899e+01 2.4 4.24e+09 1.0 0.0e+00 0.0e+00 6.5e+04 1 1 0 0 8 2 10 0 0 22 27271
VecCopy 33599 1.0 3.2168e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 2530 1.0 4.4997e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 62182 1.0 9.0091e+00 1.2 4.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 10 0 0 0 57899
VecAYPX 62182 1.0 9.1499e+00 1.2 3.03e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 7 0 0 0 42373
VecScatterBegin 64694 1.0 8.3787e+00 1.2 0.00e+00 0.0 3.3e+07 1.6e+04 0.0e+00 0 0 4 19 0 1 0100100 0 0
VecScatterEnd 64694 1.0 3.8446e+01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0
MatMult 63018 1.0 2.2957e+02 1.1 2.68e+10 1.0 3.2e+07 1.6e+04 0.0e+00 9 9 4 19 0 35 64 99 99 0 14968
MatMultTranspose 4 1.0 2.2471e-03 1.1 2.53e+05 1.0 1.5e+03 9.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14396
MatLUFactorSym 1 1.0 3.5787e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 1.3686e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 853 1.0 2.7544e+00 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.7e+03 0 0 0 0 0 0 0 0 0 1 0
MatAssemblyEnd 853 1.0 3.0681e+00 1.2 0.00e+00 0.0 1.2e+04 1.1e+03 7.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 4.0531e-06 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 2.1935e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 5852 1.0 1.6725e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 5.9e+03 0 0 0 0 1 0 0 0 0 2 0
MatPtAP 4 1.0 2.2126e-01 1.0 5.11e+06 1.0 2.5e+04 6.0e+03 1.0e+02 0 0 0 0 0 0 0 0 0 0 2953
MatPtAPSymbolic 4 1.0 1.5837e-01 1.1 0.00e+00 0.0 1.5e+04 7.8e+03 6.0e+01 0 0 0 0 0 0 0 0 0 0 0
MatPtAPNumeric 4 1.0 7.1259e-02 1.1 5.11e+06 1.0 9.7e+03 3.1e+03 4.0e+01 0 0 0 0 0 0 0 0 0 0 9170
MatGetLocalMat 4 1.0 2.6774e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 4 1.0 3.5020e-02 3.5 0.00e+00 0.0 1.1e+04 8.4e+03 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSymTrans 8 1.0 9.7649e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 6 1.0 9.6231e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01 0 0 0 0 0 0 0 0 0 0 0
Warning -- total time of even greater than time of entire stage -- something is wrong with the timer
KSPSolve 836 1.0 2.1100e+03 1.0 3.13e+11 1.0 8.0e+08 3.5e+03 7.8e+05 88100100100 99 3377422433520264 19016
PCSetUp 1 1.0 4.3869e-01 1.0 5.36e+06 1.0 3.4e+04 4.6e+03 3.0e+02 0 0 0 0 0 0 0 0 0 0 1563
Warning -- total time of even greater than time of entire stage -- something is wrong with the timer
PCApply 31091 1.0 1.7960e+03 1.0 2.71e+11 1.0 7.6e+08 3.0e+03 5.0e+05 74 87 96 81 63 2856422335421168 19329
MGSetup Level 0 1 1.0 1.3880e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 0
MGSetup Level 1 1 1.0 2.2409e-0312.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MGSetup Level 2 1 1.0 3.0804e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MGSetup Level 3 1 1.0 1.5497e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MGSetup Level 4 1 1.0 1.6789e-03 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: MG Apply
VecScale 621820 1.0 3.4077e+00 1.3 1.76e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 66146
VecCopy 31091 1.0 4.3931e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 435274 1.0 6.2440e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 124364 1.0 5.2623e+00 1.5 1.16e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 28314
VecScatterBegin 994912 1.0 9.1571e+01 1.6 0.00e+00 0.0 7.6e+08 3.0e+03 0.0e+00 3 0 96 81 0 4 0100100 0 0
VecScatterEnd 994912 1.0 1.9157e+02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
MatMult 124364 1.0 1.3612e+02 1.2 2.09e+10 1.0 1.1e+08 3.2e+03 0.0e+00 5 7 14 13 0 7 8 15 16 0 19694
MatMultAdd 124364 1.0 5.9485e+01 1.1 7.86e+09 1.0 4.8e+07 9.9e+02 0.0e+00 2 3 6 2 0 3 3 6 2 0 16907
MatMultTranspose 124364 1.0 5.8217e+01 1.4 7.86e+09 1.0 4.8e+07 9.9e+02 0.0e+00 2 3 6 2 0 3 3 6 2 0 17276
MatSolve 31091 1.0 5.5108e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
MatSOR 248728 1.0 1.4900e+03 1.0 2.33e+11 1.0 5.6e+08 3.2e+03 5.0e+05 61 74 70 65 63 83 86 73 80100 20050
KSPSolve 279819 1.0 1.5449e+03 1.0 2.33e+11 1.0 5.6e+08 3.2e+03 5.0e+05 64 74 70 65 63 86 86 73 80100 19337
PCApply 31091 1.0 5.5162e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
MGSmooth Level 0 31091 1.0 5.5668e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
MGSmooth Level 1 62182 1.0 2.0214e+01 1.2 8.54e+08 1.0 1.6e+08 1.9e+02 1.2e+05 1 0 20 1 16 1 0 21 1 25 5405
MGResid Level 1 31091 1.0 1.6500e+00 1.1 1.07e+08 1.0 3.2e+07 1.9e+02 0.0e+00 0 0 4 0 0 0 0 4 0 0 8336
MGInterp Level 1 62182 1.0 5.2735e+00 5.1 2.69e+07 1.0 2.4e+07 6.4e+01 0.0e+00 0 0 3 0 0 0 0 3 0 0 652
MGSmooth Level 2 62182 1.0 5.2998e+01 1.1 7.98e+09 1.0 1.6e+08 6.4e+02 1.2e+05 2 3 20 4 16 3 3 21 5 25 19271
MGResid Level 2 31091 1.0 4.2289e+00 1.2 8.60e+08 1.0 3.2e+07 6.4e+02 0.0e+00 0 0 4 1 0 0 0 4 1 0 26018
MGInterp Level 2 62182 1.0 3.7472e+00 1.6 2.15e+08 1.0 2.4e+07 2.1e+02 0.0e+00 0 0 3 0 0 0 0 3 0 0 7341
MGSmooth Level 3 62182 1.0 3.1296e+02 1.1 6.94e+10 1.0 1.6e+08 2.3e+03 1.2e+05 12 22 20 13 16 16 26 21 16 25 28390
MGResid Level 3 31091 1.0 2.4739e+01 1.1 6.88e+09 1.0 3.2e+07 2.3e+03 0.0e+00 1 2 4 3 0 1 3 4 3 0 35580
MGInterp Level 3 62182 1.0 1.3544e+01 1.2 1.72e+09 1.0 2.4e+07 7.7e+02 0.0e+00 0 1 3 1 0 1 1 3 1 0 16248
MGSmooth Level 4 62182 1.0 1.1547e+03 1.1 1.55e+11 1.0 8.0e+07 1.6e+04 1.2e+05 47 49 10 47 16 63 57 10 58 25 17197
MGResid Level 4 31091 1.0 1.1227e+02 1.2 1.43e+10 1.0 1.6e+07 1.6e+04 0.0e+00 4 5 2 9 0 6 5 2 12 0 16262
MGInterp Level 4 62182 1.0 9.7205e+01 1.2 1.38e+10 1.0 2.4e+07 2.9e+03 0.0e+00 4 4 3 2 0 5 5 3 3 0 18111
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 65592 65592 17415948192 0
Vector Scatter 19 19 22572 0
Matrix 38 38 14004608 0
Matrix Null Space 1 1 652 0
Distributed Mesh 5 5 830792 0
Bipartite Graph 10 10 8560 0
Index Set 47 47 534480 0
IS L to G Mapping 5 5 405756 0
Krylov Solver 7 7 9536 0
DMKSP interface 3 3 2088 0
Preconditioner 7 7 7352 0
Viewer 1 0 0 0
--- Event Stage 1: MG Apply
Vector 248728 248728 19044605504 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 7.16209e-05
Average time for zero size MPI_Send(): 1.87568e-06
#PETSc Option Table entries:
-ksp_atol 1e-9
-ksp_monitor_true_residual
-ksp_view
-log_summary
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu
-mg_levels_ksp_max_it 3
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_mg_log
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure run at: Wed Aug 28 23:25:43 2013
Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=0 --known-mpi-c-double-complex=0 --with-batch="1 " --known-mpi-shared="0 " --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.0/pgi64/lib -lacml" --COPTFLAGS="-O3 -fastsse" --FOPTFLAGS="-O3 -fastsse" --CXXOPTFLAGS="-O3 -fastsse" --with-x="0 " --with-debugging="0 " --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries=0 --with-dynamic-loading=0 --with-mpi-compilers="1 " --known-mpi-shared-libraries=0 --with-64-bit-indices --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " --with-cc=cc --with-cxx=CC --with-fc=ftn PETSC_ARCH=interlagos-64idx-pgi-opt
-----------------------------------------
Libraries compiled on Wed Aug 28 23:25:43 2013 on h2ologin3
Machine characteristics: Linux-2.6.32.59-0.7-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2
Using PETSc arch: interlagos-64idx-pgi-opt
-----------------------------------------
Using C compiler: cc -O3 -fastsse ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -O3 -fastsse ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/include -I/opt/cray/udreg/2.3.2-1.0402.7311.2.1.gem/include -I/opt/cray/ugni/5.0-1.0402.7128.7.6.gem/include -I/opt/cray/pmi/4.0.1-1.0000.9421.73.3.gem/include -I/opt/cray/dmapp/4.0.1-1.0402.7439.5.1.gem/include -I/opt/cray/gni-headers/2.1-1.0402.7082.6.2.gem/include -I/opt/cray/xpmem/0.1-2.0402.44035.2.1.gem/include -I/opt/cray/rca/1.0.0-2.0402.42153.2.106.gem/include -I/opt/cray-hss-devel/7.0.0/include -I/opt/cray/krca/1.0.0-2.0402.42157.2.94.gem/include -I/opt/cray/mpt/6.0.1/gni/mpich2-pgi/121/include -I/opt/acml/5.3.0/pgi64_fma4/include -I/opt/cray/libsci/12.1.01/pgi/121/interlagos/include -I/opt/fftw/3.3.0.3/interlagos/include -I/usr/include/alps -I/opt/pgi/13.6.0/linux86-64/13.6/include -I/opt/cray/xe-sysroot/4.2.24/usr/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/lib -lsuperlu_dist_3.3 -L/opt/acml/5.3.0/pgi64/lib -lacml -lpthread -lparmetis -lmetis -ldl
-----------------------------------------
#PETSc Option Table entries:
-ksp_atol 1e-9
-ksp_monitor_true_residual
-ksp_view
-log_summary
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu
-mg_levels_ksp_max_it 3
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_mg_log
-pc_type mg
#End of PETSc Option Table entries
There are no unused options.
More information about the petsc-users
mailing list