[petsc-users] Using PCFIELDSPLIT with -pc_fieldsplit_type schur
Dave May
dave.mayhem23 at gmail.com
Wed Jan 11 15:49:59 CST 2017
It looks like the Schur solve is requiring a huge number of iterates to
converge (based on the instances of MatMult).
This is killing the performance.
Are you sure that A11 is a good approximation to S? You might consider
trying the selfp option
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurPre.html#PCFieldSplitSetSchurPre
Note that the best approx to S is likely both problem and discretisation
dependent so if selfp is also terrible, you might want to consider coding
up your own approx to S for your specific system.
Thanks,
Dave
On Wed, 11 Jan 2017 at 22:34, David Knezevic <david.knezevic at akselos.com>
wrote:
I have a definite block 2x2 system and I figured it'd be good to apply the
PCFIELDSPLIT functionality with Schur complement, as described in Section
4.5 of the manual.
The A00 block of my matrix is very small so I figured I'd specify a direct
solver (i.e. MUMPS) for that block.
So I did the following:
- PCFieldSplitSetIS to specify the indices of the two splits
- PCFieldSplitGetSubKSP to get the two KSP objects, and to set the solver
and PC types for each (MUMPS for A00, ILU+CG for A11)
- I set -pc_fieldsplit_schur_fact_type full
Below I have pasted the output of "-ksp_view -ksp_monitor -log_view" for a
test case. It seems to converge well, but I'm concerned about the speed
(about 90 seconds, vs. about 1 second if I use a direct solver for the
entire system). I just wanted to check if I'm setting this up in a good way?
Many thanks,
David
-----------------------------------------------------------------------------------
0 KSP Residual norm 5.405774214400e+04
1 KSP Residual norm 1.849649014371e+02
2 KSP Residual norm 7.462775074989e-02
3 KSP Residual norm 2.680497175260e-04
KSP Object: 1 MPI processes
type: cg
maximum iterations=1000
tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: fieldsplit
FieldSplit with Schur preconditioner, factorization FULL
Preconditioner for the Schur complement formed from A11
Split info:
Split number 0 Defined by IS
Split number 1 Defined by IS
KSP solver for A00 block
KSP Object: (fieldsplit_RB_split_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (fieldsplit_RB_split_) 1 MPI processes
type: cholesky
Cholesky: out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0., needed 0.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=324, cols=324
package used to perform factorization: mumps
total: nonzeros=3042, allocated nonzeros=3042
total number of mallocs used during MatSetValues calls =0
MUMPS run parameters:
SYM (matrix type): 2
PAR (host participation): 1
ICNTL(1) (output for error): 6
ICNTL(2) (output of diagnostic msg): 0
ICNTL(3) (output for global info): 0
ICNTL(4) (level of printing): 0
ICNTL(5) (input mat struct): 0
ICNTL(6) (matrix prescaling): 7
ICNTL(7) (sequentia matrix ordering):7
ICNTL(8) (scalling strategy): 77
ICNTL(10) (max num of refinements): 0
ICNTL(11) (error analysis): 0
ICNTL(12) (efficiency control):
0
ICNTL(13) (efficiency control):
0
ICNTL(14) (percentage of estimated workspace increase):
20
ICNTL(18) (input mat struct):
0
ICNTL(19) (Shur complement info):
0
ICNTL(20) (rhs sparse pattern):
0
ICNTL(21) (solution struct):
0
ICNTL(22) (in-core/out-of-core facility):
0
ICNTL(23) (max size of memory can be allocated
locally):0
ICNTL(24) (detection of null pivot rows):
0
ICNTL(25) (computation of a null space basis):
0
ICNTL(26) (Schur options for rhs or solution):
0
ICNTL(27) (experimental parameter):
-24
ICNTL(28) (use parallel or sequential ordering):
1
ICNTL(29) (parallel ordering):
0
ICNTL(30) (user-specified set of entries in inv(A)):
0
ICNTL(31) (factors is discarded in the solve phase):
0
ICNTL(33) (compute determinant):
0
CNTL(1) (relative pivoting threshold): 0.01
CNTL(2) (stopping criterion of refinement): 1.49012e-08
CNTL(3) (absolute pivoting threshold): 0.
CNTL(4) (value of static pivoting): -1.
CNTL(5) (fixation for null pivots): 0.
RINFO(1) (local estimated flops for the elimination
after analysis):
[0] 29394.
RINFO(2) (local estimated flops for the assembly after
factorization):
[0] 1092.
RINFO(3) (local estimated flops for the elimination
after factorization):
[0] 29394.
INFO(15) (estimated size of (in MB) MUMPS internal data
for running numerical factorization):
[0] 1
INFO(16) (size of (in MB) MUMPS internal data used
during numerical factorization):
[0] 1
INFO(23) (num of pivots eliminated on this processor
after factorization):
[0] 324
RINFOG(1) (global estimated flops for the elimination
after analysis): 29394.
RINFOG(2) (global estimated flops for the assembly
after factorization): 1092.
RINFOG(3) (global estimated flops for the elimination
after factorization): 29394.
(RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
(0.,0.)*(2^0)
INFOG(3) (estimated real workspace for factors on all
processors after analysis): 3888
INFOG(4) (estimated integer workspace for factors on
all processors after analysis): 2067
INFOG(5) (estimated maximum front size in the complete
tree): 12
INFOG(6) (number of nodes in the complete tree): 53
INFOG(7) (ordering option effectively use after
analysis): 2
INFOG(8) (structural symmetry in percent of the
permuted matrix after analysis): 100
INFOG(9) (total real/complex workspace to store the
matrix factors after factorization): 3888
INFOG(10) (total integer space store the matrix factors
after factorization): 2067
INFOG(11) (order of largest frontal matrix after
factorization): 12
INFOG(12) (number of off-diagonal pivots): 0
INFOG(13) (number of delayed pivots after
factorization): 0
INFOG(14) (number of memory compress after
factorization): 0
INFOG(15) (number of steps of iterative refinement
after solution): 0
INFOG(16) (estimated size (in MB) of all MUMPS internal
data for factorization after analysis: value on the most memory consuming
processor): 1
INFOG(17) (estimated size of all MUMPS internal data
for factorization after analysis: sum over all processors): 1
INFOG(18) (size of all MUMPS internal data allocated
during factorization: value on the most memory consuming processor): 1
INFOG(19) (size of all MUMPS internal data allocated
during factorization: sum over all processors): 1
INFOG(20) (estimated number of entries in the factors):
3042
INFOG(21) (size in MB of memory effectively used during
factorization - value on the most memory consuming processor): 1
INFOG(22) (size in MB of memory effectively used during
factorization - sum over all processors): 1
INFOG(23) (after analysis: value of ICNTL(6)
effectively used): 5
INFOG(24) (after analysis: value of ICNTL(12)
effectively used): 1
INFOG(25) (after factorization: number of pivots
modified by static pivoting): 0
INFOG(28) (after factorization: number of null pivots
encountered): 0
INFOG(29) (after factorization: effective number of
entries in the factors (sum over all processors)): 3042
INFOG(30, 31) (after solution: size in Mbytes of memory
used during solution phase): 0, 0
INFOG(32) (after analysis: type of analysis done): 1
INFOG(33) (value used for ICNTL(8)): -2
INFOG(34) (exponent of the determinant if determinant
is requested): 0
linear system matrix = precond matrix:
Mat Object: (fieldsplit_RB_split_) 1 MPI processes
type: seqaij
rows=324, cols=324
total: nonzeros=5760, allocated nonzeros=5760
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 108 nodes, limit used is 5
KSP solver for S = A11 - A10 inv(A00) A01
KSP Object: (fieldsplit_FE_split_) 1 MPI processes
type: cg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_FE_split_) 1 MPI processes
type: bjacobi
block Jacobi: number of blocks = 1
Local solve is same for all blocks, in the following KSP and PC
objects:
KSP Object: (fieldsplit_FE_split_sub_) 1 MPI
processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (fieldsplit_FE_split_sub_) 1 MPI
processes
type: ilu
ILU: out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=28476, cols=28476
package used to perform factorization: petsc
total: nonzeros=1017054, allocated nonzeros=1017054
total number of mallocs used during MatSetValues calls
=0
using I-node routines: found 9492 nodes, limit used
is 5
linear system matrix = precond matrix:
Mat Object: (fieldsplit_FE_split_) 1 MPI
processes
type: seqaij
rows=28476, cols=28476
total: nonzeros=1017054, allocated nonzeros=1017054
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 9492 nodes, limit used is 5
linear system matrix followed by preconditioner matrix:
Mat Object: (fieldsplit_FE_split_) 1 MPI processes
type: schurcomplement
rows=28476, cols=28476
Schur complement A11 - A10 inv(A00) A01
A11
Mat Object: (fieldsplit_FE_split_)
1 MPI processes
type: seqaij
rows=28476, cols=28476
total: nonzeros=1017054, allocated nonzeros=1017054
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 9492 nodes, limit used is 5
A10
Mat Object: 1 MPI processes
type: seqaij
rows=28476, cols=324
total: nonzeros=936, allocated nonzeros=936
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 5717 nodes, limit used is 5
KSP of A00
KSP Object: (fieldsplit_RB_split_)
1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50,
divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (fieldsplit_RB_split_)
1 MPI processes
type: cholesky
Cholesky: out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0., needed 0.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=324, cols=324
package used to perform factorization: mumps
total: nonzeros=3042, allocated nonzeros=3042
total number of mallocs used during MatSetValues
calls =0
MUMPS run parameters:
SYM (matrix type): 2
PAR (host participation): 1
ICNTL(1) (output for error): 6
ICNTL(2) (output of diagnostic msg): 0
ICNTL(3) (output for global info): 0
ICNTL(4) (level of printing): 0
ICNTL(5) (input mat struct): 0
ICNTL(6) (matrix prescaling): 7
ICNTL(7) (sequentia matrix ordering):7
ICNTL(8) (scalling strategy): 77
ICNTL(10) (max num of refinements): 0
ICNTL(11) (error analysis): 0
ICNTL(12) (efficiency control):
0
ICNTL(13) (efficiency control):
0
ICNTL(14) (percentage of estimated workspace
increase): 20
ICNTL(18) (input mat struct):
0
ICNTL(19) (Shur complement info):
0
ICNTL(20) (rhs sparse pattern):
0
ICNTL(21) (solution struct):
0
ICNTL(22) (in-core/out-of-core facility):
0
ICNTL(23) (max size of memory can be allocated
locally):0
ICNTL(24) (detection of null pivot rows):
0
ICNTL(25) (computation of a null space basis):
0
ICNTL(26) (Schur options for rhs or solution):
0
ICNTL(27) (experimental parameter):
-24
ICNTL(28) (use parallel or sequential
ordering): 1
ICNTL(29) (parallel ordering):
0
ICNTL(30) (user-specified set of entries in
inv(A)): 0
ICNTL(31) (factors is discarded in the solve
phase): 0
ICNTL(33) (compute determinant):
0
CNTL(1) (relative pivoting threshold):
0.01
CNTL(2) (stopping criterion of refinement):
1.49012e-08
CNTL(3) (absolute pivoting threshold): 0.
CNTL(4) (value of static pivoting): -1.
CNTL(5) (fixation for null pivots): 0.
RINFO(1) (local estimated flops for the
elimination after analysis):
[0] 29394.
RINFO(2) (local estimated flops for the
assembly after factorization):
[0] 1092.
RINFO(3) (local estimated flops for the
elimination after factorization):
[0] 29394.
INFO(15) (estimated size of (in MB) MUMPS
internal data for running numerical factorization):
[0] 1
INFO(16) (size of (in MB) MUMPS internal data
used during numerical factorization):
[0] 1
INFO(23) (num of pivots eliminated on this
processor after factorization):
[0] 324
RINFOG(1) (global estimated flops for the
elimination after analysis): 29394.
RINFOG(2) (global estimated flops for the
assembly after factorization): 1092.
RINFOG(3) (global estimated flops for the
elimination after factorization): 29394.
(RINFOG(12) RINFOG(13))*2^INFOG(34)
(determinant): (0.,0.)*(2^0)
INFOG(3) (estimated real workspace for factors
on all processors after analysis): 3888
INFOG(4) (estimated integer workspace for
factors on all processors after analysis): 2067
INFOG(5) (estimated maximum front size in the
complete tree): 12
INFOG(6) (number of nodes in the complete
tree): 53
INFOG(7) (ordering option effectively use after
analysis): 2
INFOG(8) (structural symmetry in percent of the
permuted matrix after analysis): 100
INFOG(9) (total real/complex workspace to store
the matrix factors after factorization): 3888
INFOG(10) (total integer space store the matrix
factors after factorization): 2067
INFOG(11) (order of largest frontal matrix
after factorization): 12
INFOG(12) (number of off-diagonal pivots): 0
INFOG(13) (number of delayed pivots after
factorization): 0
INFOG(14) (number of memory compress after
factorization): 0
INFOG(15) (number of steps of iterative
refinement after solution): 0
INFOG(16) (estimated size (in MB) of all MUMPS
internal data for factorization after analysis: value on the most memory
consuming processor): 1
INFOG(17) (estimated size of all MUMPS internal
data for factorization after analysis: sum over all processors): 1
INFOG(18) (size of all MUMPS internal data
allocated during factorization: value on the most memory consuming
processor): 1
INFOG(19) (size of all MUMPS internal data
allocated during factorization: sum over all processors): 1
INFOG(20) (estimated number of entries in the
factors): 3042
INFOG(21) (size in MB of memory effectively
used during factorization - value on the most memory consuming processor):
1
INFOG(22) (size in MB of memory effectively
used during factorization - sum over all processors): 1
INFOG(23) (after analysis: value of ICNTL(6)
effectively used): 5
INFOG(24) (after analysis: value of ICNTL(12)
effectively used): 1
INFOG(25) (after factorization: number of
pivots modified by static pivoting): 0
INFOG(28) (after factorization: number of null
pivots encountered): 0
INFOG(29) (after factorization: effective
number of entries in the factors (sum over all processors)): 3042
INFOG(30, 31) (after solution: size in Mbytes
of memory used during solution phase): 0, 0
INFOG(32) (after analysis: type of analysis
done): 1
INFOG(33) (value used for ICNTL(8)): -2
INFOG(34) (exponent of the determinant if
determinant is requested): 0
linear system matrix = precond matrix:
Mat Object: (fieldsplit_RB_split_)
1 MPI processes
type: seqaij
rows=324, cols=324
total: nonzeros=5760, allocated nonzeros=5760
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 108 nodes, limit used is 5
A01
Mat Object: 1 MPI processes
type: seqaij
rows=324, cols=28476
total: nonzeros=936, allocated nonzeros=936
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 67 nodes, limit used is 5
Mat Object: (fieldsplit_FE_split_) 1 MPI processes
type: seqaij
rows=28476, cols=28476
total: nonzeros=1017054, allocated nonzeros=1017054
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 9492 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: () 1 MPI processes
type: seqaij
rows=28800, cols=28800
total: nonzeros=1024686, allocated nonzeros=1024794
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 9600 nodes, limit used is 5
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
/home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a
arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Wed Jan 11
16:16:47 2017
Using Petsc Release Version 3.7.3, unknown
Max Max/Min Avg Total
Time (sec): 9.179e+01 1.00000 9.179e+01
Objects: 1.990e+02 1.00000 1.990e+02
Flops: 1.634e+11 1.00000 1.634e+11 1.634e+11
Flops/sec: 1.780e+09 1.00000 1.780e+09 1.780e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 9.1787e+01 100.0% 1.6336e+11 100.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDot 42 1.0 2.4080e-05 1.0 8.53e+03 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 354
VecTDot 74012 1.0 1.2440e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00
0.0e+00 1 3 0 0 0 1 3 0 0 0 3388
VecNorm 37020 1.0 8.3580e-01 1.0 2.11e+09 1.0 0.0e+00 0.0e+00
0.0e+00 1 1 0 0 0 1 1 0 0 0 2523
VecScale 37008 1.0 3.5800e-01 1.0 1.05e+09 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 2944
VecCopy 37034 1.0 2.5754e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 74137 1.0 3.0537e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 74029 1.0 1.7233e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00
0.0e+00 2 3 0 0 0 2 3 0 0 0 2446
VecAYPX 37001 1.0 1.2214e+00 1.0 2.11e+09 1.0 0.0e+00 0.0e+00
0.0e+00 1 1 0 0 0 1 1 0 0 0 1725
VecAssemblyBegin 68 1.0 2.0432e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 68 1.0 2.5988e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 48 1.0 4.6921e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 37017 1.0 4.1269e+01 1.0 7.65e+10 1.0 0.0e+00 0.0e+00
0.0e+00 45 47 0 0 0 45 47 0 0 0 1853
MatMultAdd 37015 1.0 3.3638e+01 1.0 7.53e+10 1.0 0.0e+00 0.0e+00
0.0e+00 37 46 0 0 0 37 46 0 0 0 2238
MatSolve 74021 1.0 4.6602e+01 1.0 7.42e+10 1.0 0.0e+00 0.0e+00
0.0e+00 51 45 0 0 0 51 45 0 0 0 1593
MatLUFactorNum 1 1.0 1.7209e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1420
MatCholFctrSym 1 1.0 8.8310e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCholFctrNum 1 1.0 3.6907e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatILUFactorSym 1 1.0 3.7372e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 29 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 29 1.0 9.9473e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 58026 1.0 2.8155e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 6 1.0 1.5399e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 2 1.0 3.0112e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 6 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 7 1.0 3.4356e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 4 1.0 9.4891e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 8.8793e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00
0.0e+00 97100 0 0 0 97100 0 0 0 1840
PCSetUp 4 1.0 3.8375e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 637
PCSetUpOnBlocks 5 1.0 2.1250e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1150
PCApply 5 1.0 8.8789e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00
0.0e+00 97100 0 0 0 97100 0 0 0 1840
KSPSolve_FS_0 5 1.0 7.5364e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve_FS_Schu 5 1.0 8.8785e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00
0.0e+00 97100 0 0 0 97100 0 0 0 1840
KSPSolve_FS_Low 5 1.0 2.1019e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 91 91 9693912 0.
Vector Scatter 24 24 15936 0.
Index Set 51 51 537888 0.
IS L to G Mapping 3 3 240408 0.
Matrix 13 13 64097868 0.
Krylov Solver 6 6 7888 0.
Preconditioner 6 6 6288 0.
Viewer 1 0 0 0.
Distributed Mesh 1 1 4624 0.
Star Forest Bipartite Graph 2 2 1616 0.
Discrete System 1 1 872 0.
========================================================================================================================
Average time to get PetscTime(): 0.
#PETSc Option Table entries:
-ksp_monitor
-ksp_view
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-shared-libraries=1 --with-debugging=0
--download-suitesparse --download-blacs --download-ptscotch=yes
--with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl
--CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps
--download-metis
--prefix=/home/dknez/software/libmesh_install/opt_real/petsc
--download-hypre --download-ml
-----------------------------------------
Libraries compiled on Wed Sep 21 17:38:52 2016 on david-Lenovo
Machine characteristics:
Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial
Using PETSc directory: /home/dknez/software/petsc-src
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0
-Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/home/dknez/software/petsc-src/arch-linux2-c-opt/include
-I/home/dknez/software/petsc-src/include
-I/home/dknez/software/petsc-src/include
-I/home/dknez/software/petsc-src/arch-linux2-c-opt/include
-I/home/dknez/software/libmesh_install/opt_real/petsc/include
-I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent
-I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include
-I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries:
-Wl,-rpath,/home/dknez/software/petsc-src/arch-linux2-c-opt/lib
-L/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib
-L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lcmumps -ldmumps
-lsmumps -lzmumps -lmumps_common -lpord -lmetis -lHYPRE
-Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5
-L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu
-L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu
-L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lscalapack -lml -lmpi_cxx
-lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd
-lsuitesparseconfig
-Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64
-L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64
-lmkl_sequential -lmkl_core -lpthread -lm -lhwloc -lptesmumps -lptscotch
-lptscotcherr -lscotch -lscotcherr -lX11 -lm -lmpi_usempif08
-lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm
-lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lpthread -lz
-Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5
-L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu
-L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu
-L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu
-L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/usr/lib/openmpi/lib -lmpi
-lgcc_s -lpthread -ldl
-----------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170111/391494ef/attachment-0001.html>
More information about the petsc-users
mailing list