[petsc-users] Performance of Fieldsplit PC
Bernardo Rocha
bernardomartinsrocha at gmail.com
Tue Nov 7 09:55:19 CST 2017
Thanks for the reply.
1) This is block-Jacobi, why not use PCBJACOBI? Is it because you want to
> select rows?
>
I'm only using it to understand the performance behavior of PCFieldSplit
since I'm also
having the same issue in a large and more complex problem.
> 2) We cannot tell anything without knowing how many iterates were used
>
-ksp_monitor_true_residual -ksp_converged_reason
> -pc_fieldsplit_[0,1]_ksp_monitor_true_residual
>
> 3) We cannot say anything about performance without seeing the log for
> both runs
> -log_view
>
I'm sending to you the log files with the recommended command line
arguments for the three cases.
1-scalar case
2-PCFieldSplit (as we were initially running)
3-PCFieldSplit with Preonly/Jacobi in each block, as suggested by Patrick.
As Patrick pointed out, with Preonly/Jacobi the behavior is closer to what
I expected.
Please note that the log was taken for 100 calls to KSPSolve, I just
simplified it.
What would be the proper way of creating this block preconditioner
As you can see, the timing with PCFieldSplit is bigger for case 3.
For case 2 it is nearly 2x, as I expected (I don't know if this idea makes
sense).
So for the case 2, the reason for the large timing is due to the
inner/outer solver?
Does the "machinery" behind the PCFieldSplit for a block preconditioner
results
in some performance overhead? (neglecting the efficiency of the PC itself)
Best regards,
Bernardo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171107/6be92ef6/attachment-0001.html>
-------------- next part --------------
0 KSP preconditioned resid norm 9.909609586673e+01 true resid norm 6.621816260761e+10 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 1.273902381670e+01 true resid norm 1.094099204442e+09 ||r(i)||/||b|| 1.652264516800e-02
2 KSP preconditioned resid norm 6.460292016408e+00 true resid norm 2.813270803973e+08 ||r(i)||/||b|| 4.248488168788e-03
3 KSP preconditioned resid norm 4.267523086682e+00 true resid norm 1.227432629606e+08 ||r(i)||/||b|| 1.853619280982e-03
4 KSP preconditioned resid norm 2.956794930503e+00 true resid norm 5.892889289089e+07 ||r(i)||/||b|| 8.899203869502e-04
5 KSP preconditioned resid norm 1.553867540080e+00 true resid norm 1.629422769996e+07 ||r(i)||/||b|| 2.460688587286e-04
6 KSP preconditioned resid norm 5.863411243068e-01 true resid norm 2.339364215736e+06 ||r(i)||/||b|| 3.532813541805e-05
7 KSP preconditioned resid norm 2.949598244316e-01 true resid norm 6.042774335918e+05 ||r(i)||/||b|| 9.125554225548e-06
8 KSP preconditioned resid norm 1.810861505194e-01 true resid norm 2.303685149906e+05 ||r(i)||/||b|| 3.478932454766e-06
9 KSP preconditioned resid norm 1.063228930690e-01 true resid norm 7.738363702513e+04 ||r(i)||/||b|| 1.168616493993e-06
10 KSP preconditioned resid norm 5.539338985670e-02 true resid norm 1.753917253023e+04 ||r(i)||/||b|| 2.648695137339e-07
11 KSP preconditioned resid norm 2.897182710946e-02 true resid norm 2.729986729636e+03 ||r(i)||/||b|| 4.122715916799e-08
12 KSP preconditioned resid norm 1.695869301131e-02 true resid norm 4.766670897136e+02 ||r(i)||/||b|| 7.198434250408e-09
13 KSP preconditioned resid norm 9.226255542270e-03 true resid norm 2.547340821913e+02 ||r(i)||/||b|| 3.846891429181e-09
14 KSP preconditioned resid norm 4.999664022085e-03 true resid norm 2.173100659726e+02 ||r(i)||/||b|| 3.281729021392e-09
15 KSP preconditioned resid norm 2.822889124856e-03 true resid norm 1.227719812726e+02 ||r(i)||/||b|| 1.854052973353e-09
16 KSP preconditioned resid norm 1.612223553945e-03 true resid norm 7.400379602761e+01 ||r(i)||/||b|| 1.117575497619e-09
17 KSP preconditioned resid norm 8.796911180671e-04 true resid norm 7.103508796600e+01 ||r(i)||/||b|| 1.072743265121e-09
18 KSP preconditioned resid norm 4.819064182097e-04 true resid norm 7.368434157044e+01 ||r(i)||/||b|| 1.112751225175e-09
19 KSP preconditioned resid norm 2.758550257952e-04 true resid norm 8.624440250822e+01 ||r(i)||/||b|| 1.302428202656e-09
20 KSP preconditioned resid norm 1.527969474414e-04 true resid norm 9.706581744817e+01 ||r(i)||/||b|| 1.465848849104e-09
21 KSP preconditioned resid norm 8.423993997128e-05 true resid norm 1.055048065519e+02 ||r(i)||/||b|| 1.593291060900e-09
22 KSP preconditioned resid norm 4.630697601290e-05 true resid norm 1.164992856073e+02 ||r(i)||/||b|| 1.759325251860e-09
23 KSP preconditioned resid norm 2.582157010506e-05 true resid norm 1.251285547265e+02 ||r(i)||/||b|| 1.889640995748e-09
24 KSP preconditioned resid norm 1.439945755021e-05 true resid norm 1.468650092478e+02 ||r(i)||/||b|| 2.217896170241e-09
25 KSP preconditioned resid norm 7.649560254453e-06 true resid norm 1.686932293751e+02 ||r(i)||/||b|| 2.547537151926e-09
26 KSP preconditioned resid norm 4.116629854365e-06 true resid norm 1.862186959283e+02 ||r(i)||/||b|| 2.812199683519e-09
27 KSP preconditioned resid norm 2.293702349842e-06 true resid norm 2.098309185181e+02 ||r(i)||/||b|| 3.168781951283e-09
28 KSP preconditioned resid norm 1.302840266880e-06 true resid norm 2.200686132505e+02 ||r(i)||/||b|| 3.323387490447e-09
29 KSP preconditioned resid norm 7.703427859997e-07 true resid norm 1.626698090949e+02 ||r(i)||/||b|| 2.456573886818e-09
Linear solve converged due to CONVERGED_RTOL iterations 29
KSP Object: 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: jacobi
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=9953, cols=9953
total: nonzeros=132617, allocated nonzeros=298590
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Number of iterations: 29
Residual norm: 7.70343e-07
Total time: 2.67914
Writing data file
Done
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./poisson on a arch-linux2-c-debug named localhost.localdomain with 1 processor, by joventino Tue Nov 7 13:00:48 2017
Using Petsc Release Version 3.5.4, May, 23, 2015
Max Max/Min Avg Total
Time (sec): 3.104e+00 1.00000 3.104e+00
Objects: 6.042e+03 1.00000 6.042e+03
Flops: 4.430e+09 1.00000 4.430e+09 4.430e+09
Flops/sec: 1.427e+09 1.00000 1.427e+09 1.427e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.1044e+00 100.0% 4.4303e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 5900 1.0 1.1548e+00 1.0 1.51e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 0 0 0 37 34 0 0 0 1304
MatAssemblyBegin 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 9.4891e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 100 1.0 4.6666e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 2900 1.0 3.2593e-01 1.0 8.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 20 0 0 0 10 20 0 0 0 2657
VecNorm 6001 1.0 5.5606e-02 1.0 1.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 2148
VecScale 3000 1.0 1.6580e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1801
VecCopy 3100 1.0 3.1643e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecSet 9141 1.0 5.4358e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecAXPY 3000 1.0 2.2286e-02 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2680
VecAYPX 3000 1.0 4.8099e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 621
VecMAXPY 5900 1.0 6.5193e-01 1.0 1.79e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 40 0 0 0 21 40 0 0 0 2745
VecPointwiseMult 3000 1.0 5.8018e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 515
VecNormalize 3000 1.0 4.7506e-02 1.0 8.96e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1886
KSPGMRESOrthog 2900 1.0 6.4069e-01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 39 0 0 0 21 39 0 0 0 2703
KSPSetUp 100 1.0 2.6917e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 100 1.0 2.6619e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00 0.0e+00 86100 0 0 0 86100 0 0 0 1664
PCSetUp 1 1.0 7.1526e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3000 1.0 6.0910e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 490
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 1 0 0 0
Vector 6038 6036 489688608 0
Krylov Solver 1 1 18616 0
Preconditioner 1 1 856 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 4.76837e-08
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_rtol 1e-8
-ksp_type gmres
-ksp_view
-log_view
-m /home/joventino/Downloads/russa.xml
-pc_type jacobi
#End of PETSc Option Table entries
-------------- next part --------------
0 KSP preconditioned resid norm 9.515173597913e+01 true resid norm 9.364662363510e+10 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 3.204643328070e-06 true resid norm 2.672963301097e+00 ||r(i)||/||b|| 2.854308246619e-11
2 KSP preconditioned resid norm 2.804824521054e-13 true resid norm 2.770364057767e-04 ||r(i)||/||b|| 2.958317075650e-15
Linear solve converged due to CONVERGED_RTOL iterations 2
KSP Object: 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: fieldsplit
FieldSplit with ADDITIVE composition: total splits = 2
Solver info for each split is in the following KSP objects:
Split number 0 Defined by IS
KSP Object: (fieldsplit_X_) 1 MPI processes
type: cg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_X_) 1 MPI processes
type: jacobi
linear system matrix = precond matrix:
Mat Object: (fieldsplit_X_) 1 MPI processes
type: seqaij
rows=9953, cols=9953
total: nonzeros=132617, allocated nonzeros=132617
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Split number 1 Defined by IS
KSP Object: (fieldsplit_Y_) 1 MPI processes
type: cg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_Y_) 1 MPI processes
type: jacobi
linear system matrix = precond matrix:
Mat Object: (fieldsplit_Y_) 1 MPI processes
type: seqaij
rows=9953, cols=9953
total: nonzeros=132617, allocated nonzeros=132617
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=19906, cols=19906
total: nonzeros=265234, allocated nonzeros=1.19436e+06
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Number of iterations: 2
Residual norm: 2.80482e-13
Total time: 5.1269
Writing data file
Done
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./poisson on a arch-linux2-c-debug named localhost.localdomain with 1 processor, by joventino Tue Nov 7 13:01:56 2017
Using Petsc Release Version 3.5.4, May, 23, 2015
Max Max/Min Avg Total
Time (sec): 5.609e+00 1.00000 5.609e+00
Objects: 6.370e+02 1.00000 6.370e+02
Flops: 7.589e+09 1.00000 7.589e+09 7.589e+09
Flops/sec: 1.353e+09 1.00000 1.353e+09 1.353e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 5.6092e+00 100.0% 7.5885e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 19300 1.0 3.5085e+00 1.0 5.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00 63 67 0 0 0 63 67 0 0 0 1441
MatAssemblyBegin 3 1.0 1.4305e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 3 1.0 3.1419e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 2 1.0 2.2733e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 300 1.0 1.8467e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 200 1.0 7.8726e-03 1.0 1.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1517
VecTDot 37600 1.0 3.4746e-01 1.0 7.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00 6 10 0 0 0 6 10 0 0 0 2154
VecNorm 20100 1.0 1.8720e-01 1.0 4.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 5 0 0 0 3 5 0 0 0 2212
VecScale 300 1.0 2.8970e-03 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2061
VecCopy 1600 1.0 2.0661e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1925 1.0 2.7724e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 37900 1.0 3.0721e-01 1.0 7.60e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 10 0 0 0 5 10 0 0 0 2475
VecAYPX 18500 1.0 2.6606e-01 1.0 3.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 5 0 0 0 5 5 0 0 0 1384
VecMAXPY 500 1.0 1.6370e-02 1.0 3.18e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1946
VecPointwiseMult 19400 1.0 2.5479e-01 1.0 1.93e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 3 0 0 0 5 3 0 0 0 758
VecScatterBegin 1200 1.0 2.5261e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 300 1.0 9.1097e-03 1.0 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1967
KSPGMRESOrthog 200 1.0 1.4125e-02 1.0 2.39e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1691
KSPSetUp 102 1.0 3.4118e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 100 1.0 5.0754e+00 1.0 7.59e+09 1.0 0.0e+00 0.0e+00 0.0e+00 90100 0 0 0 90100 0 0 0 1495
PCSetUp 3 1.0 2.5232e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 300 1.0 4.7249e+00 1.0 7.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 84 95 0 0 0 84 95 0 0 0 1532
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 3 2 3506688 0
Vector 621 619 98550000 0
Vector Scatter 2 2 1288 0
Krylov Solver 3 3 21064 0
Preconditioner 3 3 2720 0
Viewer 1 0 0 0
Index Set 4 2 1568 0
========================================================================================================================
Average time to get PetscTime(): 4.76837e-08
#PETSc Option Table entries:
-fieldsplit_X_ksp_rtol 1e-8
-fieldsplit_X_ksp_type cg
-fieldsplit_X_pc_type jacobi
-fieldsplit_Y_ksp_rtol 1e-8
-fieldsplit_Y_ksp_type cg
-fieldsplit_Y_pc_type jacobi
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_view
-log_view
-m /home/joventino/Downloads/russa.xml
-pc_fieldsplit_[0,1]_ksp_monitor_true_residual
-pc_fieldsplit_type additive
-pc_type fieldsplit
#End of PETSc Option Table entries
-------------- next part --------------
0 KSP preconditioned resid norm 1.401430427530e+02 true resid norm 9.364662363510e+10 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 1.801570025298e+01 true resid norm 1.547289933503e+09 ||r(i)||/||b|| 1.652264516799e-02
2 KSP preconditioned resid norm 9.136232586494e+00 true resid norm 3.978565725596e+08 ||r(i)||/||b|| 4.248488168777e-03
3 KSP preconditioned resid norm 6.035189026926e+00 true resid norm 1.735851871679e+08 ||r(i)||/||b|| 1.853619280971e-03
4 KSP preconditioned resid norm 4.181539491874e+00 true resid norm 8.333803954090e+07 ||r(i)||/||b|| 8.899203869392e-04
5 KSP preconditioned resid norm 2.197500549312e+00 true resid norm 2.304351780071e+07 ||r(i)||/||b|| 2.460688587183e-04
6 KSP preconditioned resid norm 8.292115701718e-01 true resid norm 3.308360600270e+06 ||r(i)||/||b|| 3.532813540786e-05
7 KSP preconditioned resid norm 4.171361840663e-01 true resid norm 8.545773410120e+05 ||r(i)||/||b|| 9.125554214767e-06
8 KSP preconditioned resid norm 2.560944900224e-01 true resid norm 3.257902772942e+05 ||r(i)||/||b|| 3.478932444630e-06
9 KSP preconditioned resid norm 1.503632773689e-01 true resid norm 1.094369879940e+05 ||r(i)||/||b|| 1.168616483392e-06
10 KSP preconditioned resid norm 7.833808320116e-02 true resid norm 2.480413464487e+04 ||r(i)||/||b|| 2.648695028400e-07
11 KSP preconditioned resid norm 4.097235082492e-02 true resid norm 3.860783217718e+03 ||r(i)||/||b|| 4.122714805780e-08
12 KSP preconditioned resid norm 2.398321365671e-02 true resid norm 6.741080801947e+02 ||r(i)||/||b|| 7.198423755473e-09
13 KSP preconditioned resid norm 1.304789571780e-02 true resid norm 3.602473867603e+02 ||r(i)||/||b|| 3.846880675208e-09
14 KSP preconditioned resid norm 7.070592667344e-03 true resid norm 3.073218548774e+02 ||r(i)||/||b|| 3.281718474708e-09
15 KSP preconditioned resid norm 3.992168085451e-03 true resid norm 1.736249159923e+02 ||r(i)||/||b|| 1.854043522902e-09
16 KSP preconditioned resid norm 2.280028415568e-03 true resid norm 1.046566169144e+02 ||r(i)||/||b|| 1.117569570071e-09
17 KSP preconditioned resid norm 1.244071109869e-03 true resid norm 1.004589356606e+02 ||r(i)||/||b|| 1.072744876014e-09
18 KSP preconditioned resid norm 6.815185924237e-04 true resid norm 1.042054174147e+02 ||r(i)||/||b|| 1.112751462570e-09
19 KSP preconditioned resid norm 3.901179187272e-04 true resid norm 1.219681423290e+02 ||r(i)||/||b|| 1.302429682935e-09
20 KSP preconditioned resid norm 2.160875153605e-04 true resid norm 1.372716915795e+02 ||r(i)||/||b|| 1.465847739630e-09
21 KSP preconditioned resid norm 1.191332656017e-04 true resid norm 1.492063885795e+02 ||r(i)||/||b|| 1.593291704364e-09
22 KSP preconditioned resid norm 6.548795351426e-05 true resid norm 1.647547652308e+02 ||r(i)||/||b|| 1.759324136157e-09
23 KSP preconditioned resid norm 3.651721464750e-05 true resid norm 1.769585591874e+02 ||r(i)||/||b|| 1.889641637021e-09
24 KSP preconditioned resid norm 2.036390815968e-05 true resid norm 2.076983894694e+02 ||r(i)||/||b|| 2.217895118981e-09
25 KSP preconditioned resid norm 1.081811185654e-05 true resid norm 2.385683116238e+02 ||r(i)||/||b|| 2.547537779401e-09
26 KSP preconditioned resid norm 5.821793768980e-06 true resid norm 2.633529192793e+02 ||r(i)||/||b|| 2.812198764426e-09
27 KSP preconditioned resid norm 3.243784970505e-06 true resid norm 2.967457868685e+02 ||r(i)||/||b|| 3.168782550290e-09
28 KSP preconditioned resid norm 1.842494373206e-06 true resid norm 3.112239423017e+02 ||r(i)||/||b|| 3.323386687324e-09
29 KSP preconditioned resid norm 1.089429216198e-06 true resid norm 2.300499177590e+02 ||r(i)||/||b|| 2.456574608129e-09
Linear solve converged due to CONVERGED_RTOL iterations 29
KSP Object: 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: fieldsplit
FieldSplit with ADDITIVE composition: total splits = 2
Solver info for each split is in the following KSP objects:
Split number 0 Defined by IS
KSP Object: (fieldsplit_X_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (fieldsplit_X_) 1 MPI processes
type: jacobi
linear system matrix = precond matrix:
Mat Object: (fieldsplit_X_) 1 MPI processes
type: seqaij
rows=9953, cols=9953
total: nonzeros=132617, allocated nonzeros=132617
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Split number 1 Defined by IS
KSP Object: (fieldsplit_Y_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (fieldsplit_Y_) 1 MPI processes
type: jacobi
linear system matrix = precond matrix:
Mat Object: (fieldsplit_Y_) 1 MPI processes
type: seqaij
rows=9953, cols=9953
total: nonzeros=132617, allocated nonzeros=132617
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=19906, cols=19906
total: nonzeros=265234, allocated nonzeros=1.19436e+06
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Number of iterations: 29
Residual norm: 1.08943e-06
Total time: 6.58937
0
0
Writing data file
Done
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./poisson on a arch-linux2-c-debug named localhost.localdomain with 1 processor, by joventino Tue Nov 7 13:04:06 2017
Using Petsc Release Version 3.5.4, May, 23, 2015
Max Max/Min Avg Total
Time (sec): 7.098e+00 1.00000 7.098e+00
Objects: 6.060e+03 1.00000 6.060e+03
Flops: 8.865e+09 1.00000 8.865e+09 8.865e+09
Flops/sec: 1.249e+09 1.00000 1.249e+09 1.249e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 7.0984e+00 100.0% 8.8646e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 5900 1.0 2.8474e+00 1.0 3.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 34 0 0 0 40 34 0 0 0 1058
MatAssemblyBegin 3 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 3 1.0 2.9294e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 2 1.0 2.2478e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 300 1.0 1.9589e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 2900 1.0 7.7983e-01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 20 0 0 0 11 20 0 0 0 2221
VecNorm 6100 1.0 1.1267e-01 1.0 2.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 2155
VecScale 3000 1.0 3.1897e-02 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1872
VecCopy 3100 1.0 7.6415e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecSet 18148 1.0 2.6946e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
VecAXPY 3000 1.0 5.7204e-02 1.0 1.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2088
VecAYPX 3000 1.0 1.0830e-01 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 551
VecMAXPY 5900 1.0 1.5695e+00 1.0 3.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 22 40 0 0 0 22 40 0 0 0 2280
VecPointwiseMult 6000 1.0 1.0145e-01 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 589
VecScatterBegin 12000 1.0 1.8266e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
VecNormalize 3000 1.0 9.1894e-02 1.0 1.79e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1950
KSPGMRESOrthog 2900 1.0 1.5169e+00 1.0 3.46e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 39 0 0 0 21 39 0 0 0 2283
KSPSetUp 102 1.0 2.6441e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 100 1.0 6.5351e+00 1.0 8.86e+09 1.0 0.0e+00 0.0e+00 0.0e+00 92100 0 0 0 92100 0 0 0 1356
PCSetUp 3 1.0 2.5163e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3000 1.0 4.7949e-01 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 1 0 0 0 7 1 0 0 0 125
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 3 2 3506688 0
Vector 6044 6042 970785840 0
Vector Scatter 2 2 1288 0
Krylov Solver 3 3 20936 0
Preconditioner 3 3 2720 0
Viewer 1 0 0 0
Index Set 4 2 1568 0
========================================================================================================================
Average time to get PetscTime(): 4.76837e-08
#PETSc Option Table entries:
-fieldsplit_X_ksp_rtol 1e-8
-fieldsplit_X_ksp_type preonly
-fieldsplit_X_pc_type jacobi
-fieldsplit_Y_ksp_rtol 1e-8
-fieldsplit_Y_ksp_type preonly
-fieldsplit_Y_pc_type jacobi
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_view
-log_view
-m /home/joventino/Downloads/russa.xml
-pc_fieldsplit_[0,1]_ksp_monitor_true_residual
-pc_fieldsplit_type additive
-pc_type fieldsplit
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-suitesparse --download-hypre --download-mpich --with-debugging=0
-----------------------------------------
Libraries compiled on Fri Aug 4 16:15:03 2017 on localhost.localdomain
Machine characteristics: Linux-4.11.9-200.fc25.x86_64-x86_64-with-fedora-25-Twenty_Five
Using PETSc directory: /home/joventino/source/petsc-3.5.4
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------
Using C compiler: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/include -I/home/joventino/source/petsc-3.5.4/include -I/home/joventino/source/petsc-3.5.4/include -I/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/include
-----------------------------------------
Using C linker: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -L/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -L/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -lmpichcxx -lstdc++ -lflapack -lfblas -lpthread -lm -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -L/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -ldl -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
-----------------------------------------
More information about the petsc-users
mailing list