[petsc-users] Using PCFIELDSPLIT with -pc_fieldsplit_type schur
David Knezevic
david.knezevic at akselos.com
Wed Jan 11 18:32:30 CST 2017
On Wed, Jan 11, 2017 at 5:52 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
> so I gather that I'll have to look into a user-defined approximation to S.
>>
>
> Where does the 2x2 block system come from?
> Maybe someone on the list knows the right approximation to use for S.
>
The model is 3D linear elasticity using a finite element discretization. I
applied substructuring to part of the system to "condense" it, and that
results in the small A00 block. The A11 block is just standard 3D
elasticity; no substructuring was applied there. There are constraints to
connect the degrees of freedom on the interface of the substructured and
non-substructured regions.
If anyone has suggestions for a good way to precondition this type of
system, I'd be most appreciative!
Thanks,
David
> -----------------------------------------
>>
>> 0 KSP Residual norm 5.405528187695e+04
>> 1 KSP Residual norm 2.187814910803e+02
>> 2 KSP Residual norm 1.019051577515e-01
>> 3 KSP Residual norm 4.370464012859e-04
>> KSP Object: 1 MPI processes
>> type: cg
>> maximum iterations=1000
>> tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using nonzero initial guess
>> using PRECONDITIONED norm type for convergence test
>> PC Object: 1 MPI processes
>> type: fieldsplit
>> FieldSplit with Schur preconditioner, factorization FULL
>> Preconditioner for the Schur complement formed from Sp, an assembled
>> approximation to S, which uses (lumped, if requested) A00's diagonal's
>> inverse
>> Split info:
>> Split number 0 Defined by IS
>> Split number 1 Defined by IS
>> KSP solver for A00 block
>> KSP Object: (fieldsplit_RB_split_) 1 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_RB_split_) 1 MPI processes
>> type: cholesky
>> Cholesky: out-of-place factorization
>> tolerance for zero pivot 2.22045e-14
>> matrix ordering: natural
>> factor fill ratio given 0., needed 0.
>> Factored matrix follows:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=324, cols=324
>> package used to perform factorization: mumps
>> total: nonzeros=3042, allocated nonzeros=3042
>> total number of mallocs used during MatSetValues calls =0
>> MUMPS run parameters:
>> SYM (matrix type): 2
>> PAR (host participation): 1
>> ICNTL(1) (output for error): 6
>> ICNTL(2) (output of diagnostic msg): 0
>> ICNTL(3) (output for global info): 0
>> ICNTL(4) (level of printing): 0
>> ICNTL(5) (input mat struct): 0
>> ICNTL(6) (matrix prescaling): 7
>> ICNTL(7) (sequentia matrix ordering):7
>> ICNTL(8) (scalling strategy): 77
>> ICNTL(10) (max num of refinements): 0
>> ICNTL(11) (error analysis): 0
>> ICNTL(12) (efficiency control):
>> 0
>> ICNTL(13) (efficiency control):
>> 0
>> ICNTL(14) (percentage of estimated workspace
>> increase): 20
>> ICNTL(18) (input mat struct):
>> 0
>> ICNTL(19) (Shur complement info):
>> 0
>> ICNTL(20) (rhs sparse pattern):
>> 0
>> ICNTL(21) (solution struct):
>> 0
>> ICNTL(22) (in-core/out-of-core facility):
>> 0
>> ICNTL(23) (max size of memory can be allocated
>> locally):0
>> ICNTL(24) (detection of null pivot rows):
>> 0
>> ICNTL(25) (computation of a null space basis):
>> 0
>> ICNTL(26) (Schur options for rhs or solution):
>> 0
>> ICNTL(27) (experimental parameter):
>> -24
>> ICNTL(28) (use parallel or sequential ordering):
>> 1
>> ICNTL(29) (parallel ordering):
>> 0
>> ICNTL(30) (user-specified set of entries in inv(A)):
>> 0
>> ICNTL(31) (factors is discarded in the solve phase):
>> 0
>> ICNTL(33) (compute determinant):
>> 0
>> CNTL(1) (relative pivoting threshold): 0.01
>> CNTL(2) (stopping criterion of refinement):
>> 1.49012e-08
>> CNTL(3) (absolute pivoting threshold): 0.
>> CNTL(4) (value of static pivoting): -1.
>> CNTL(5) (fixation for null pivots): 0.
>> RINFO(1) (local estimated flops for the elimination
>> after analysis):
>> [0] 29394.
>> RINFO(2) (local estimated flops for the assembly
>> after factorization):
>> [0] 1092.
>> RINFO(3) (local estimated flops for the elimination
>> after factorization):
>> [0] 29394.
>> INFO(15) (estimated size of (in MB) MUMPS internal
>> data for running numerical factorization):
>> [0] 1
>> INFO(16) (size of (in MB) MUMPS internal data used
>> during numerical factorization):
>> [0] 1
>> INFO(23) (num of pivots eliminated on this processor
>> after factorization):
>> [0] 324
>> RINFOG(1) (global estimated flops for the elimination
>> after analysis): 29394.
>> RINFOG(2) (global estimated flops for the assembly
>> after factorization): 1092.
>> RINFOG(3) (global estimated flops for the elimination
>> after factorization): 29394.
>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>> (0.,0.)*(2^0)
>> INFOG(3) (estimated real workspace for factors on all
>> processors after analysis): 3888
>> INFOG(4) (estimated integer workspace for factors on
>> all processors after analysis): 2067
>> INFOG(5) (estimated maximum front size in the
>> complete tree): 12
>> INFOG(6) (number of nodes in the complete tree): 53
>> INFOG(7) (ordering option effectively use after
>> analysis): 2
>> INFOG(8) (structural symmetry in percent of the
>> permuted matrix after analysis): 100
>> INFOG(9) (total real/complex workspace to store the
>> matrix factors after factorization): 3888
>> INFOG(10) (total integer space store the matrix
>> factors after factorization): 2067
>> INFOG(11) (order of largest frontal matrix after
>> factorization): 12
>> INFOG(12) (number of off-diagonal pivots): 0
>> INFOG(13) (number of delayed pivots after
>> factorization): 0
>> INFOG(14) (number of memory compress after
>> factorization): 0
>> INFOG(15) (number of steps of iterative refinement
>> after solution): 0
>> INFOG(16) (estimated size (in MB) of all MUMPS
>> internal data for factorization after analysis: value on the most memory
>> consuming processor): 1
>> INFOG(17) (estimated size of all MUMPS internal data
>> for factorization after analysis: sum over all processors): 1
>> INFOG(18) (size of all MUMPS internal data allocated
>> during factorization: value on the most memory consuming processor): 1
>> INFOG(19) (size of all MUMPS internal data allocated
>> during factorization: sum over all processors): 1
>> INFOG(20) (estimated number of entries in the
>> factors): 3042
>> INFOG(21) (size in MB of memory effectively used
>> during factorization - value on the most memory consuming processor): 1
>> INFOG(22) (size in MB of memory effectively used
>> during factorization - sum over all processors): 1
>> INFOG(23) (after analysis: value of ICNTL(6)
>> effectively used): 5
>> INFOG(24) (after analysis: value of ICNTL(12)
>> effectively used): 1
>> INFOG(25) (after factorization: number of pivots
>> modified by static pivoting): 0
>> INFOG(28) (after factorization: number of null pivots
>> encountered): 0
>> INFOG(29) (after factorization: effective number of
>> entries in the factors (sum over all processors)): 3042
>> INFOG(30, 31) (after solution: size in Mbytes of
>> memory used during solution phase): 0, 0
>> INFOG(32) (after analysis: type of analysis done): 1
>> INFOG(33) (value used for ICNTL(8)): -2
>> INFOG(34) (exponent of the determinant if determinant
>> is requested): 0
>> linear system matrix = precond matrix:
>> Mat Object: (fieldsplit_RB_split_) 1 MPI processes
>> type: seqaij
>> rows=324, cols=324
>> total: nonzeros=5760, allocated nonzeros=5760
>> total number of mallocs used during MatSetValues calls =0
>> using I-node routines: found 108 nodes, limit used is 5
>> KSP solver for S = A11 - A10 inv(A00) A01
>> KSP Object: (fieldsplit_FE_split_) 1 MPI processes
>> type: cg
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using PRECONDITIONED norm type for convergence test
>> PC Object: (fieldsplit_FE_split_) 1 MPI processes
>> type: bjacobi
>> block Jacobi: number of blocks = 1
>> Local solve is same for all blocks, in the following KSP and PC
>> objects:
>> KSP Object: (fieldsplit_FE_split_sub_) 1 MPI
>> processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_FE_split_sub_) 1 MPI
>> processes
>> type: ilu
>> ILU: out-of-place factorization
>> 0 levels of fill
>> tolerance for zero pivot 2.22045e-14
>> matrix ordering: natural
>> factor fill ratio given 1., needed 1.
>> Factored matrix follows:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=28476, cols=28476
>> package used to perform factorization: petsc
>> total: nonzeros=1037052, allocated nonzeros=1037052
>> total number of mallocs used during MatSetValues
>> calls =0
>> using I-node routines: found 9489 nodes, limit used
>> is 5
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=28476, cols=28476
>> total: nonzeros=1037052, allocated nonzeros=1037052
>> total number of mallocs used during MatSetValues calls =0
>> using I-node routines: found 9489 nodes, limit used is 5
>> linear system matrix followed by preconditioner matrix:
>> Mat Object: (fieldsplit_FE_split_) 1 MPI processes
>> type: schurcomplement
>> rows=28476, cols=28476
>> Schur complement A11 - A10 inv(A00) A01
>> A11
>> Mat Object: (fieldsplit_FE_split_)
>> 1 MPI processes
>> type: seqaij
>> rows=28476, cols=28476
>> total: nonzeros=1017054, allocated nonzeros=1017054
>> total number of mallocs used during MatSetValues calls =0
>> using I-node routines: found 9492 nodes, limit used is 5
>> A10
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=28476, cols=324
>> total: nonzeros=936, allocated nonzeros=936
>> total number of mallocs used during MatSetValues calls =0
>> using I-node routines: found 5717 nodes, limit used is 5
>> KSP of A00
>> KSP Object: (fieldsplit_RB_split_)
>> 1 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_RB_split_)
>> 1 MPI processes
>> type: cholesky
>> Cholesky: out-of-place factorization
>> tolerance for zero pivot 2.22045e-14
>> matrix ordering: natural
>> factor fill ratio given 0., needed 0.
>> Factored matrix follows:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=324, cols=324
>> package used to perform factorization: mumps
>> total: nonzeros=3042, allocated nonzeros=3042
>> total number of mallocs used during MatSetValues
>> calls =0
>> MUMPS run parameters:
>> SYM (matrix type): 2
>> PAR (host participation): 1
>> ICNTL(1) (output for error): 6
>> ICNTL(2) (output of diagnostic msg): 0
>> ICNTL(3) (output for global info): 0
>> ICNTL(4) (level of printing): 0
>> ICNTL(5) (input mat struct): 0
>> ICNTL(6) (matrix prescaling): 7
>> ICNTL(7) (sequentia matrix ordering):7
>> ICNTL(8) (scalling strategy): 77
>> ICNTL(10) (max num of refinements): 0
>> ICNTL(11) (error analysis): 0
>> ICNTL(12) (efficiency control):
>> 0
>> ICNTL(13) (efficiency control):
>> 0
>> ICNTL(14) (percentage of estimated workspace
>> increase): 20
>> ICNTL(18) (input mat struct):
>> 0
>> ICNTL(19) (Shur complement info):
>> 0
>> ICNTL(20) (rhs sparse pattern):
>> 0
>> ICNTL(21) (solution struct):
>> 0
>> ICNTL(22) (in-core/out-of-core facility):
>> 0
>> ICNTL(23) (max size of memory can be
>> allocated locally):0
>> ICNTL(24) (detection of null pivot rows):
>> 0
>> ICNTL(25) (computation of a null space
>> basis): 0
>> ICNTL(26) (Schur options for rhs or
>> solution): 0
>> ICNTL(27) (experimental parameter):
>> -24
>> ICNTL(28) (use parallel or sequential
>> ordering): 1
>> ICNTL(29) (parallel ordering):
>> 0
>> ICNTL(30) (user-specified set of entries in
>> inv(A)): 0
>> ICNTL(31) (factors is discarded in the solve
>> phase): 0
>> ICNTL(33) (compute determinant):
>> 0
>> CNTL(1) (relative pivoting threshold):
>> 0.01
>> CNTL(2) (stopping criterion of refinement):
>> 1.49012e-08
>> CNTL(3) (absolute pivoting threshold):
>> 0.
>> CNTL(4) (value of static pivoting):
>> -1.
>> CNTL(5) (fixation for null pivots):
>> 0.
>> RINFO(1) (local estimated flops for the
>> elimination after analysis):
>> [0] 29394.
>> RINFO(2) (local estimated flops for the
>> assembly after factorization):
>> [0] 1092.
>> RINFO(3) (local estimated flops for the
>> elimination after factorization):
>> [0] 29394.
>> INFO(15) (estimated size of (in MB) MUMPS
>> internal data for running numerical factorization):
>> [0] 1
>> INFO(16) (size of (in MB) MUMPS internal data
>> used during numerical factorization):
>> [0] 1
>> INFO(23) (num of pivots eliminated on this
>> processor after factorization):
>> [0] 324
>> RINFOG(1) (global estimated flops for the
>> elimination after analysis): 29394.
>> RINFOG(2) (global estimated flops for the
>> assembly after factorization): 1092.
>> RINFOG(3) (global estimated flops for the
>> elimination after factorization): 29394.
>> (RINFOG(12) RINFOG(13))*2^INFOG(34)
>> (determinant): (0.,0.)*(2^0)
>> INFOG(3) (estimated real workspace for
>> factors on all processors after analysis): 3888
>> INFOG(4) (estimated integer workspace for
>> factors on all processors after analysis): 2067
>> INFOG(5) (estimated maximum front size in the
>> complete tree): 12
>> INFOG(6) (number of nodes in the complete
>> tree): 53
>> INFOG(7) (ordering option effectively use
>> after analysis): 2
>> INFOG(8) (structural symmetry in percent of
>> the permuted matrix after analysis): 100
>> INFOG(9) (total real/complex workspace to
>> store the matrix factors after factorization): 3888
>> INFOG(10) (total integer space store the
>> matrix factors after factorization): 2067
>> INFOG(11) (order of largest frontal matrix
>> after factorization): 12
>> INFOG(12) (number of off-diagonal pivots): 0
>> INFOG(13) (number of delayed pivots after
>> factorization): 0
>> INFOG(14) (number of memory compress after
>> factorization): 0
>> INFOG(15) (number of steps of iterative
>> refinement after solution): 0
>> INFOG(16) (estimated size (in MB) of all
>> MUMPS internal data for factorization after analysis: value on the most
>> memory consuming processor): 1
>> INFOG(17) (estimated size of all MUMPS
>> internal data for factorization after analysis: sum over all processors): 1
>> INFOG(18) (size of all MUMPS internal data
>> allocated during factorization: value on the most memory consuming
>> processor): 1
>> INFOG(19) (size of all MUMPS internal data
>> allocated during factorization: sum over all processors): 1
>> INFOG(20) (estimated number of entries in the
>> factors): 3042
>> INFOG(21) (size in MB of memory effectively
>> used during factorization - value on the most memory consuming processor):
>> 1
>> INFOG(22) (size in MB of memory effectively
>> used during factorization - sum over all processors): 1
>> INFOG(23) (after analysis: value of ICNTL(6)
>> effectively used): 5
>> INFOG(24) (after analysis: value of ICNTL(12)
>> effectively used): 1
>> INFOG(25) (after factorization: number of
>> pivots modified by static pivoting): 0
>> INFOG(28) (after factorization: number of
>> null pivots encountered): 0
>> INFOG(29) (after factorization: effective
>> number of entries in the factors (sum over all processors)): 3042
>> INFOG(30, 31) (after solution: size in Mbytes
>> of memory used during solution phase): 0, 0
>> INFOG(32) (after analysis: type of analysis
>> done): 1
>> INFOG(33) (value used for ICNTL(8)): -2
>> INFOG(34) (exponent of the determinant if
>> determinant is requested): 0
>> linear system matrix = precond matrix:
>> Mat Object: (fieldsplit_RB_split_)
>> 1 MPI processes
>> type: seqaij
>> rows=324, cols=324
>> total: nonzeros=5760, allocated nonzeros=5760
>> total number of mallocs used during MatSetValues calls
>> =0
>> using I-node routines: found 108 nodes, limit used is
>> 5
>> A01
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=324, cols=28476
>> total: nonzeros=936, allocated nonzeros=936
>> total number of mallocs used during MatSetValues calls =0
>> using I-node routines: found 67 nodes, limit used is 5
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=28476, cols=28476
>> total: nonzeros=1037052, allocated nonzeros=1037052
>> total number of mallocs used during MatSetValues calls =0
>> using I-node routines: found 9489 nodes, limit used is 5
>> linear system matrix = precond matrix:
>> Mat Object: () 1 MPI processes
>> type: seqaij
>> rows=28800, cols=28800
>> total: nonzeros=1024686, allocated nonzeros=1024794
>> total number of mallocs used during MatSetValues calls =0
>> using I-node routines: found 9600 nodes, limit used is 5
>>
>> ---------------------------------------------- PETSc Performance
>> Summary: ----------------------------------------------
>>
>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a
>> arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Wed Jan 11
>> 17:22:10 2017
>> Using Petsc Release Version 3.7.3, unknown
>>
>> Max Max/Min Avg Total
>> Time (sec): 9.638e+01 1.00000 9.638e+01
>> Objects: 2.030e+02 1.00000 2.030e+02
>> Flops: 1.732e+11 1.00000 1.732e+11 1.732e+11
>> Flops/sec: 1.797e+09 1.00000 1.797e+09 1.797e+09
>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
>> MPI Reductions: 0.000e+00 0.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>> e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>> and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
>> --- -- Message Lengths -- -- Reductions --
>> Avg %Total Avg %Total counts
>> %Total Avg %Total counts %Total
>> 0: Main Stage: 9.6379e+01 100.0% 1.7318e+11 100.0% 0.000e+00
>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>>
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>> Count: number of times phase was executed
>> Time and Flops: Max - maximum over all processors
>> Ratio - ratio of maximum to minimum over all processors
>> Mess: number of messages sent
>> Avg. len: average message length (bytes)
>> Reduct: number of global reductions
>> Global: entire computation
>> Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>> %T - percent time in this phase %F - percent flops in this
>> phase
>> %M - percent messages in this phase %L - percent message
>> lengths in this phase
>> %R - percent reductions in this phase
>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> Event Count Time (sec) Flops
>> --- Global --- --- Stage --- Total
>> Max Ratio Max Ratio Max Ratio Mess Avg len
>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecDot 42 1.0 2.2411e-05 1.0 8.53e+03 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 380
>> VecTDot 77761 1.0 1.4294e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 1 3 0 0 0 1 3 0 0 0 3098
>> VecNorm 38894 1.0 9.1002e-01 1.0 2.22e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 1 1 0 0 0 1 1 0 0 0 2434
>> VecScale 38882 1.0 3.7314e-01 1.0 1.11e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 2967
>> VecCopy 38908 1.0 2.1655e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecSet 77887 1.0 3.2034e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecAXPY 77777 1.0 1.8382e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 2 3 0 0 0 2 3 0 0 0 2409
>> VecAYPX 38875 1.0 1.2884e+00 1.0 2.21e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 1 1 0 0 0 1 1 0 0 0 1718
>> VecAssemblyBegin 68 1.0 1.9407e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecAssemblyEnd 68 1.0 2.6941e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecScatterBegin 48 1.0 4.6349e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatMult 38891 1.0 4.3045e+01 1.0 8.03e+10 1.0 0.0e+00 0.0e+00
>> 0.0e+00 45 46 0 0 0 45 46 0 0 0 1866
>> MatMultAdd 38889 1.0 3.5360e+01 1.0 7.91e+10 1.0 0.0e+00 0.0e+00
>> 0.0e+00 37 46 0 0 0 37 46 0 0 0 2236
>> MatSolve 77769 1.0 4.8780e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00
>> 0.0e+00 51 46 0 0 0 51 46 0 0 0 1631
>> MatLUFactorNum 1 1.0 1.9575e-02 1.0 2.49e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1274
>> MatCholFctrSym 1 1.0 9.4891e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatCholFctrNum 1 1.0 3.7885e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatILUFactorSym 1 1.0 4.1780e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatConvert 1 1.0 3.0041e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatScale 2 1.0 2.7180e-05 1.0 2.53e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 930
>> MatAssemblyBegin 32 1.0 4.0531e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatAssemblyEnd 32 1.0 1.2032e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatGetRow 114978 1.0 5.9254e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatGetRowIJ 2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatGetSubMatrice 6 1.0 1.5707e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatGetOrdering 2 1.0 3.2425e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatZeroEntries 6 1.0 3.0580e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatView 7 1.0 3.5119e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatAXPY 1 1.0 1.9384e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatMatMult 1 1.0 2.7120e-03 1.0 3.16e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 117
>> MatMatMultSym 1 1.0 1.8010e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatMatMultNum 1 1.0 6.1703e-04 1.0 3.16e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 513
>> KSPSetUp 4 1.0 9.8944e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSolve 1 1.0 9.3380e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00
>> 0.0e+00 97100 0 0 0 97100 0 0 0 1855
>> PCSetUp 4 1.0 6.6326e-02 1.0 2.53e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 381
>> PCSetUpOnBlocks 5 1.0 2.4082e-02 1.0 2.49e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1036
>> PCApply 5 1.0 9.3376e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00
>> 0.0e+00 97100 0 0 0 97100 0 0 0 1855
>> KSPSolve_FS_0 5 1.0 7.0214e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSolve_FS_Schu 5 1.0 9.3372e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00
>> 0.0e+00 97100 0 0 0 97100 0 0 0 1855
>> KSPSolve_FS_Low 5 1.0 2.1377e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type Creations Destructions Memory Descendants'
>> Mem.
>> Reports information only for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>> Vector 92 92 9698040 0.
>> Vector Scatter 24 24 15936 0.
>> Index Set 51 51 537876 0.
>> IS L to G Mapping 3 3 240408 0.
>> Matrix 16 16 77377776 0.
>> Krylov Solver 6 6 7888 0.
>> Preconditioner 6 6 6288 0.
>> Viewer 1 0 0 0.
>> Distributed Mesh 1 1 4624 0.
>> Star Forest Bipartite Graph 2 2 1616 0.
>> Discrete System 1 1 872 0.
>> ============================================================
>> ============================================================
>> Average time to get PetscTime(): 0.
>> #PETSc Option Table entries:
>> -ksp_monitor
>> -ksp_view
>> -log_view
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure options: --with-shared-libraries=1 --with-debugging=0
>> --download-suitesparse --download-blacs --download-ptscotch=yes
>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl
>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps
>> --download-metis --prefix=/home/dknez/software/libmesh_install/opt_real/petsc
>> --download-hypre --download-ml
>> -----------------------------------------
>> Libraries compiled on Wed Sep 21 17:38:52 2016 on david-Lenovo
>> Machine characteristics: Linux-4.4.0-38-generic-x86_64-
>> with-Ubuntu-16.04-xenial
>> Using PETSc directory: /home/dknez/software/petsc-src
>> Using PETSc arch: arch-linux2-c-opt
>> -----------------------------------------
>>
>> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings
>> -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O
>> ${COPTFLAGS} ${CFLAGS}
>> Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0
>> -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS}
>> -----------------------------------------
>>
>> Using include paths: -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include
>> -I/home/dknez/software/petsc-src/include -I/home/dknez/software/petsc-src/include
>> -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include
>> -I/home/dknez/software/libmesh_install/opt_real/petsc/include
>> -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent
>> -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include
>> -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
>> -----------------------------------------
>>
>> Using C linker: mpicc
>> Using Fortran linker: mpif90
>> Using libraries: -Wl,-rpath,/home/dknez/softwar
>> e/petsc-src/arch-linux2-c-opt/lib -L/home/dknez/software/petsc-src/arch-linux2-c-opt/lib
>> -lpetsc -Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib
>> -L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lcmumps
>> -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lmetis -lHYPRE
>> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5
>> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu
>> -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx
>> -lstdc++ -lscalapack -lml -lmpi_cxx -lstdc++ -lumfpack -lklu -lcholmod
>> -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig
>> -Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64
>> -L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64
>> -lmkl_sequential -lmkl_core -lpthread -lm -lhwloc -lptesmumps -lptscotch
>> -lptscotcherr -lscotch -lscotcherr -lX11 -lm -lmpi_usempif08
>> -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm
>> -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lpthread -lz
>> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5
>> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu
>> -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu
>> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl
>> -Wl,-rpath,/usr/lib/openmpi/lib -lmpi -lgcc_s -lpthread -ldl
>> -----------------------------------------
>>
>>
>>
>>
>> On Wed, Jan 11, 2017 at 4:49 PM, Dave May <dave.mayhem23 at gmail.com>
>> wrote:
>>
>>> It looks like the Schur solve is requiring a huge number of iterates to
>>> converge (based on the instances of MatMult).
>>> This is killing the performance.
>>>
>>> Are you sure that A11 is a good approximation to S? You might consider
>>> trying the selfp option
>>>
>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/
>>> PC/PCFieldSplitSetSchurPre.html#PCFieldSplitSetSchurPre
>>>
>>> Note that the best approx to S is likely both problem and discretisation
>>> dependent so if selfp is also terrible, you might want to consider coding
>>> up your own approx to S for your specific system.
>>>
>>>
>>> Thanks,
>>> Dave
>>>
>>>
>>> On Wed, 11 Jan 2017 at 22:34, David Knezevic <david.knezevic at akselos.com>
>>> wrote:
>>>
>>> I have a definite block 2x2 system and I figured it'd be good to apply
>>> the PCFIELDSPLIT functionality with Schur complement, as described in
>>> Section 4.5 of the manual.
>>>
>>> The A00 block of my matrix is very small so I figured I'd specify a
>>> direct solver (i.e. MUMPS) for that block.
>>>
>>> So I did the following:
>>> - PCFieldSplitSetIS to specify the indices of the two splits
>>> - PCFieldSplitGetSubKSP to get the two KSP objects, and to set the
>>> solver and PC types for each (MUMPS for A00, ILU+CG for A11)
>>> - I set -pc_fieldsplit_schur_fact_type full
>>>
>>> Below I have pasted the output of "-ksp_view -ksp_monitor -log_view" for
>>> a test case. It seems to converge well, but I'm concerned about the speed
>>> (about 90 seconds, vs. about 1 second if I use a direct solver for the
>>> entire system). I just wanted to check if I'm setting this up in a good way?
>>>
>>> Many thanks,
>>> David
>>>
>>> ------------------------------------------------------------
>>> -----------------------
>>>
>>> 0 KSP Residual norm 5.405774214400e+04
>>> 1 KSP Residual norm 1.849649014371e+02
>>> 2 KSP Residual norm 7.462775074989e-02
>>> 3 KSP Residual norm 2.680497175260e-04
>>> KSP Object: 1 MPI processes
>>> type: cg
>>> maximum iterations=1000
>>> tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
>>> left preconditioning
>>> using nonzero initial guess
>>> using PRECONDITIONED norm type for convergence test
>>> PC Object: 1 MPI processes
>>> type: fieldsplit
>>> FieldSplit with Schur preconditioner, factorization FULL
>>> Preconditioner for the Schur complement formed from A11
>>> Split info:
>>> Split number 0 Defined by IS
>>> Split number 1 Defined by IS
>>> KSP solver for A00 block
>>> KSP Object: (fieldsplit_RB_split_) 1 MPI processes
>>> type: preonly
>>> maximum iterations=10000, initial guess is zero
>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>> left preconditioning
>>> using NONE norm type for convergence test
>>> PC Object: (fieldsplit_RB_split_) 1 MPI processes
>>> type: cholesky
>>> Cholesky: out-of-place factorization
>>> tolerance for zero pivot 2.22045e-14
>>> matrix ordering: natural
>>> factor fill ratio given 0., needed 0.
>>> Factored matrix follows:
>>> Mat Object: 1 MPI processes
>>> type: seqaij
>>> rows=324, cols=324
>>> package used to perform factorization: mumps
>>> total: nonzeros=3042, allocated nonzeros=3042
>>> total number of mallocs used during MatSetValues calls =0
>>> MUMPS run parameters:
>>> SYM (matrix type): 2
>>> PAR (host participation): 1
>>> ICNTL(1) (output for error): 6
>>> ICNTL(2) (output of diagnostic msg): 0
>>> ICNTL(3) (output for global info): 0
>>> ICNTL(4) (level of printing): 0
>>> ICNTL(5) (input mat struct): 0
>>> ICNTL(6) (matrix prescaling): 7
>>> ICNTL(7) (sequentia matrix ordering):7
>>> ICNTL(8) (scalling strategy): 77
>>> ICNTL(10) (max num of refinements): 0
>>> ICNTL(11) (error analysis): 0
>>> ICNTL(12) (efficiency control):
>>> 0
>>> ICNTL(13) (efficiency control):
>>> 0
>>> ICNTL(14) (percentage of estimated workspace
>>> increase): 20
>>> ICNTL(18) (input mat struct):
>>> 0
>>> ICNTL(19) (Shur complement info):
>>> 0
>>> ICNTL(20) (rhs sparse pattern):
>>> 0
>>> ICNTL(21) (solution struct):
>>> 0
>>> ICNTL(22) (in-core/out-of-core facility):
>>> 0
>>> ICNTL(23) (max size of memory can be allocated
>>> locally):0
>>> ICNTL(24) (detection of null pivot rows):
>>> 0
>>> ICNTL(25) (computation of a null space basis):
>>> 0
>>> ICNTL(26) (Schur options for rhs or solution):
>>> 0
>>> ICNTL(27) (experimental parameter):
>>> -24
>>> ICNTL(28) (use parallel or sequential ordering):
>>> 1
>>> ICNTL(29) (parallel ordering):
>>> 0
>>> ICNTL(30) (user-specified set of entries in inv(A)):
>>> 0
>>> ICNTL(31) (factors is discarded in the solve phase):
>>> 0
>>> ICNTL(33) (compute determinant):
>>> 0
>>> CNTL(1) (relative pivoting threshold): 0.01
>>> CNTL(2) (stopping criterion of refinement):
>>> 1.49012e-08
>>> CNTL(3) (absolute pivoting threshold): 0.
>>> CNTL(4) (value of static pivoting): -1.
>>> CNTL(5) (fixation for null pivots): 0.
>>> RINFO(1) (local estimated flops for the elimination
>>> after analysis):
>>> [0] 29394.
>>> RINFO(2) (local estimated flops for the assembly
>>> after factorization):
>>> [0] 1092.
>>> RINFO(3) (local estimated flops for the elimination
>>> after factorization):
>>> [0] 29394.
>>> INFO(15) (estimated size of (in MB) MUMPS internal
>>> data for running numerical factorization):
>>> [0] 1
>>> INFO(16) (size of (in MB) MUMPS internal data used
>>> during numerical factorization):
>>> [0] 1
>>> INFO(23) (num of pivots eliminated on this processor
>>> after factorization):
>>> [0] 324
>>> RINFOG(1) (global estimated flops for the
>>> elimination after analysis): 29394.
>>> RINFOG(2) (global estimated flops for the assembly
>>> after factorization): 1092.
>>> RINFOG(3) (global estimated flops for the
>>> elimination after factorization): 29394.
>>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>>> (0.,0.)*(2^0)
>>> INFOG(3) (estimated real workspace for factors on
>>> all processors after analysis): 3888
>>> INFOG(4) (estimated integer workspace for factors on
>>> all processors after analysis): 2067
>>> INFOG(5) (estimated maximum front size in the
>>> complete tree): 12
>>> INFOG(6) (number of nodes in the complete tree): 53
>>> INFOG(7) (ordering option effectively use after
>>> analysis): 2
>>> INFOG(8) (structural symmetry in percent of the
>>> permuted matrix after analysis): 100
>>> INFOG(9) (total real/complex workspace to store the
>>> matrix factors after factorization): 3888
>>> INFOG(10) (total integer space store the matrix
>>> factors after factorization): 2067
>>> INFOG(11) (order of largest frontal matrix after
>>> factorization): 12
>>> INFOG(12) (number of off-diagonal pivots): 0
>>> INFOG(13) (number of delayed pivots after
>>> factorization): 0
>>> INFOG(14) (number of memory compress after
>>> factorization): 0
>>> INFOG(15) (number of steps of iterative refinement
>>> after solution): 0
>>> INFOG(16) (estimated size (in MB) of all MUMPS
>>> internal data for factorization after analysis: value on the most memory
>>> consuming processor): 1
>>> INFOG(17) (estimated size of all MUMPS internal data
>>> for factorization after analysis: sum over all processors): 1
>>> INFOG(18) (size of all MUMPS internal data allocated
>>> during factorization: value on the most memory consuming processor): 1
>>> INFOG(19) (size of all MUMPS internal data allocated
>>> during factorization: sum over all processors): 1
>>> INFOG(20) (estimated number of entries in the
>>> factors): 3042
>>> INFOG(21) (size in MB of memory effectively used
>>> during factorization - value on the most memory consuming processor): 1
>>> INFOG(22) (size in MB of memory effectively used
>>> during factorization - sum over all processors): 1
>>> INFOG(23) (after analysis: value of ICNTL(6)
>>> effectively used): 5
>>> INFOG(24) (after analysis: value of ICNTL(12)
>>> effectively used): 1
>>> INFOG(25) (after factorization: number of pivots
>>> modified by static pivoting): 0
>>> INFOG(28) (after factorization: number of null
>>> pivots encountered): 0
>>> INFOG(29) (after factorization: effective number of
>>> entries in the factors (sum over all processors)): 3042
>>> INFOG(30, 31) (after solution: size in Mbytes of
>>> memory used during solution phase): 0, 0
>>> INFOG(32) (after analysis: type of analysis done): 1
>>> INFOG(33) (value used for ICNTL(8)): -2
>>> INFOG(34) (exponent of the determinant if
>>> determinant is requested): 0
>>> linear system matrix = precond matrix:
>>> Mat Object: (fieldsplit_RB_split_) 1 MPI processes
>>> type: seqaij
>>> rows=324, cols=324
>>> total: nonzeros=5760, allocated nonzeros=5760
>>> total number of mallocs used during MatSetValues calls =0
>>> using I-node routines: found 108 nodes, limit used is 5
>>> KSP solver for S = A11 - A10 inv(A00) A01
>>> KSP Object: (fieldsplit_FE_split_) 1 MPI processes
>>> type: cg
>>> maximum iterations=10000, initial guess is zero
>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>> left preconditioning
>>> using PRECONDITIONED norm type for convergence test
>>> PC Object: (fieldsplit_FE_split_) 1 MPI processes
>>> type: bjacobi
>>> block Jacobi: number of blocks = 1
>>> Local solve is same for all blocks, in the following KSP and
>>> PC objects:
>>> KSP Object: (fieldsplit_FE_split_sub_) 1
>>> MPI processes
>>> type: preonly
>>> maximum iterations=10000, initial guess is zero
>>> tolerances: relative=1e-05, absolute=1e-50,
>>> divergence=10000.
>>> left preconditioning
>>> using NONE norm type for convergence test
>>> PC Object: (fieldsplit_FE_split_sub_) 1 MPI
>>> processes
>>> type: ilu
>>> ILU: out-of-place factorization
>>> 0 levels of fill
>>> tolerance for zero pivot 2.22045e-14
>>> matrix ordering: natural
>>> factor fill ratio given 1., needed 1.
>>> Factored matrix follows:
>>> Mat Object: 1 MPI processes
>>> type: seqaij
>>> rows=28476, cols=28476
>>> package used to perform factorization: petsc
>>> total: nonzeros=1017054, allocated nonzeros=1017054
>>> total number of mallocs used during MatSetValues
>>> calls =0
>>> using I-node routines: found 9492 nodes, limit
>>> used is 5
>>> linear system matrix = precond matrix:
>>> Mat Object: (fieldsplit_FE_split_) 1
>>> MPI processes
>>> type: seqaij
>>> rows=28476, cols=28476
>>> total: nonzeros=1017054, allocated nonzeros=1017054
>>> total number of mallocs used during MatSetValues calls =0
>>> using I-node routines: found 9492 nodes, limit used is 5
>>> linear system matrix followed by preconditioner matrix:
>>> Mat Object: (fieldsplit_FE_split_) 1 MPI processes
>>> type: schurcomplement
>>> rows=28476, cols=28476
>>> Schur complement A11 - A10 inv(A00) A01
>>> A11
>>> Mat Object: (fieldsplit_FE_split_)
>>> 1 MPI processes
>>> type: seqaij
>>> rows=28476, cols=28476
>>> total: nonzeros=1017054, allocated nonzeros=1017054
>>> total number of mallocs used during MatSetValues calls =0
>>> using I-node routines: found 9492 nodes, limit used is
>>> 5
>>> A10
>>> Mat Object: 1 MPI processes
>>> type: seqaij
>>> rows=28476, cols=324
>>> total: nonzeros=936, allocated nonzeros=936
>>> total number of mallocs used during MatSetValues calls =0
>>> using I-node routines: found 5717 nodes, limit used is
>>> 5
>>> KSP of A00
>>> KSP Object: (fieldsplit_RB_split_)
>>> 1 MPI processes
>>> type: preonly
>>> maximum iterations=10000, initial guess is zero
>>> tolerances: relative=1e-05, absolute=1e-50,
>>> divergence=10000.
>>> left preconditioning
>>> using NONE norm type for convergence test
>>> PC Object: (fieldsplit_RB_split_)
>>> 1 MPI processes
>>> type: cholesky
>>> Cholesky: out-of-place factorization
>>> tolerance for zero pivot 2.22045e-14
>>> matrix ordering: natural
>>> factor fill ratio given 0., needed 0.
>>> Factored matrix follows:
>>> Mat Object: 1 MPI processes
>>> type: seqaij
>>> rows=324, cols=324
>>> package used to perform factorization: mumps
>>> total: nonzeros=3042, allocated nonzeros=3042
>>> total number of mallocs used during MatSetValues
>>> calls =0
>>> MUMPS run parameters:
>>> SYM (matrix type): 2
>>> PAR (host participation): 1
>>> ICNTL(1) (output for error): 6
>>> ICNTL(2) (output of diagnostic msg): 0
>>> ICNTL(3) (output for global info): 0
>>> ICNTL(4) (level of printing): 0
>>> ICNTL(5) (input mat struct): 0
>>> ICNTL(6) (matrix prescaling): 7
>>> ICNTL(7) (sequentia matrix ordering):7
>>> ICNTL(8) (scalling strategy): 77
>>> ICNTL(10) (max num of refinements): 0
>>> ICNTL(11) (error analysis): 0
>>> ICNTL(12) (efficiency control):
>>> 0
>>> ICNTL(13) (efficiency control):
>>> 0
>>> ICNTL(14) (percentage of estimated workspace
>>> increase): 20
>>> ICNTL(18) (input mat struct):
>>> 0
>>> ICNTL(19) (Shur complement info):
>>> 0
>>> ICNTL(20) (rhs sparse pattern):
>>> 0
>>> ICNTL(21) (solution struct):
>>> 0
>>> ICNTL(22) (in-core/out-of-core facility):
>>> 0
>>> ICNTL(23) (max size of memory can be
>>> allocated locally):0
>>> ICNTL(24) (detection of null pivot rows):
>>> 0
>>> ICNTL(25) (computation of a null space
>>> basis): 0
>>> ICNTL(26) (Schur options for rhs or
>>> solution): 0
>>> ICNTL(27) (experimental parameter):
>>> -24
>>> ICNTL(28) (use parallel or sequential
>>> ordering): 1
>>> ICNTL(29) (parallel ordering):
>>> 0
>>> ICNTL(30) (user-specified set of entries in
>>> inv(A)): 0
>>> ICNTL(31) (factors is discarded in the solve
>>> phase): 0
>>> ICNTL(33) (compute determinant):
>>> 0
>>> CNTL(1) (relative pivoting threshold):
>>> 0.01
>>> CNTL(2) (stopping criterion of refinement):
>>> 1.49012e-08
>>> CNTL(3) (absolute pivoting threshold):
>>> 0.
>>> CNTL(4) (value of static pivoting):
>>> -1.
>>> CNTL(5) (fixation for null pivots):
>>> 0.
>>> RINFO(1) (local estimated flops for the
>>> elimination after analysis):
>>> [0] 29394.
>>> RINFO(2) (local estimated flops for the
>>> assembly after factorization):
>>> [0] 1092.
>>> RINFO(3) (local estimated flops for the
>>> elimination after factorization):
>>> [0] 29394.
>>> INFO(15) (estimated size of (in MB) MUMPS
>>> internal data for running numerical factorization):
>>> [0] 1
>>> INFO(16) (size of (in MB) MUMPS internal
>>> data used during numerical factorization):
>>> [0] 1
>>> INFO(23) (num of pivots eliminated on this
>>> processor after factorization):
>>> [0] 324
>>> RINFOG(1) (global estimated flops for the
>>> elimination after analysis): 29394.
>>> RINFOG(2) (global estimated flops for the
>>> assembly after factorization): 1092.
>>> RINFOG(3) (global estimated flops for the
>>> elimination after factorization): 29394.
>>> (RINFOG(12) RINFOG(13))*2^INFOG(34)
>>> (determinant): (0.,0.)*(2^0)
>>> INFOG(3) (estimated real workspace for
>>> factors on all processors after analysis): 3888
>>> INFOG(4) (estimated integer workspace for
>>> factors on all processors after analysis): 2067
>>> INFOG(5) (estimated maximum front size in
>>> the complete tree): 12
>>> INFOG(6) (number of nodes in the complete
>>> tree): 53
>>> INFOG(7) (ordering option effectively use
>>> after analysis): 2
>>> INFOG(8) (structural symmetry in percent of
>>> the permuted matrix after analysis): 100
>>> INFOG(9) (total real/complex workspace to
>>> store the matrix factors after factorization): 3888
>>> INFOG(10) (total integer space store the
>>> matrix factors after factorization): 2067
>>> INFOG(11) (order of largest frontal matrix
>>> after factorization): 12
>>> INFOG(12) (number of off-diagonal pivots): 0
>>> INFOG(13) (number of delayed pivots after
>>> factorization): 0
>>> INFOG(14) (number of memory compress after
>>> factorization): 0
>>> INFOG(15) (number of steps of iterative
>>> refinement after solution): 0
>>> INFOG(16) (estimated size (in MB) of all
>>> MUMPS internal data for factorization after analysis: value on the most
>>> memory consuming processor): 1
>>> INFOG(17) (estimated size of all MUMPS
>>> internal data for factorization after analysis: sum over all processors): 1
>>> INFOG(18) (size of all MUMPS internal data
>>> allocated during factorization: value on the most memory consuming
>>> processor): 1
>>> INFOG(19) (size of all MUMPS internal data
>>> allocated during factorization: sum over all processors): 1
>>> INFOG(20) (estimated number of entries in
>>> the factors): 3042
>>> INFOG(21) (size in MB of memory effectively
>>> used during factorization - value on the most memory consuming processor):
>>> 1
>>> INFOG(22) (size in MB of memory effectively
>>> used during factorization - sum over all processors): 1
>>> INFOG(23) (after analysis: value of ICNTL(6)
>>> effectively used): 5
>>> INFOG(24) (after analysis: value of
>>> ICNTL(12) effectively used): 1
>>> INFOG(25) (after factorization: number of
>>> pivots modified by static pivoting): 0
>>> INFOG(28) (after factorization: number of
>>> null pivots encountered): 0
>>> INFOG(29) (after factorization: effective
>>> number of entries in the factors (sum over all processors)): 3042
>>> INFOG(30, 31) (after solution: size in
>>> Mbytes of memory used during solution phase): 0, 0
>>> INFOG(32) (after analysis: type of analysis
>>> done): 1
>>> INFOG(33) (value used for ICNTL(8)): -2
>>> INFOG(34) (exponent of the determinant if
>>> determinant is requested): 0
>>> linear system matrix = precond matrix:
>>> Mat Object: (fieldsplit_RB_split_)
>>> 1 MPI processes
>>> type: seqaij
>>> rows=324, cols=324
>>> total: nonzeros=5760, allocated nonzeros=5760
>>> total number of mallocs used during MatSetValues calls
>>> =0
>>> using I-node routines: found 108 nodes, limit used
>>> is 5
>>> A01
>>> Mat Object: 1 MPI processes
>>> type: seqaij
>>> rows=324, cols=28476
>>> total: nonzeros=936, allocated nonzeros=936
>>> total number of mallocs used during MatSetValues calls =0
>>> using I-node routines: found 67 nodes, limit used is 5
>>> Mat Object: (fieldsplit_FE_split_) 1 MPI processes
>>> type: seqaij
>>> rows=28476, cols=28476
>>> total: nonzeros=1017054, allocated nonzeros=1017054
>>> total number of mallocs used during MatSetValues calls =0
>>> using I-node routines: found 9492 nodes, limit used is 5
>>> linear system matrix = precond matrix:
>>> Mat Object: () 1 MPI processes
>>> type: seqaij
>>> rows=28800, cols=28800
>>> total: nonzeros=1024686, allocated nonzeros=1024794
>>> total number of mallocs used during MatSetValues calls =0
>>> using I-node routines: found 9600 nodes, limit used is 5
>>>
>>>
>>> ---------------------------------------------- PETSc Performance
>>> Summary: ----------------------------------------------
>>>
>>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a
>>> arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Wed Jan 11
>>> 16:16:47 2017
>>> Using Petsc Release Version 3.7.3, unknown
>>>
>>> Max Max/Min Avg Total
>>> Time (sec): 9.179e+01 1.00000 9.179e+01
>>> Objects: 1.990e+02 1.00000 1.990e+02
>>> Flops: 1.634e+11 1.00000 1.634e+11 1.634e+11
>>> Flops/sec: 1.780e+09 1.00000 1.780e+09 1.780e+09
>>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
>>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
>>> MPI Reductions: 0.000e+00 0.00000
>>>
>>> Flop counting convention: 1 flop = 1 real number operation of type
>>> (multiply/divide/add/subtract)
>>> e.g., VecAXPY() for real vectors of length N
>>> --> 2N flops
>>> and VecAXPY() for complex vectors of length
>>> N --> 8N flops
>>>
>>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
>>> --- -- Message Lengths -- -- Reductions --
>>> Avg %Total Avg %Total counts
>>> %Total Avg %Total counts %Total
>>> 0: Main Stage: 9.1787e+01 100.0% 1.6336e+11 100.0% 0.000e+00
>>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>>>
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>> See the 'Profiling' chapter of the users' manual for details on
>>> interpreting output.
>>> Phase summary info:
>>> Count: number of times phase was executed
>>> Time and Flops: Max - maximum over all processors
>>> Ratio - ratio of maximum to minimum over all
>>> processors
>>> Mess: number of messages sent
>>> Avg. len: average message length (bytes)
>>> Reduct: number of global reductions
>>> Global: entire computation
>>> Stage: stages of a computation. Set stages with PetscLogStagePush()
>>> and PetscLogStagePop().
>>> %T - percent time in this phase %F - percent flops in this
>>> phase
>>> %M - percent messages in this phase %L - percent message
>>> lengths in this phase
>>> %R - percent reductions in this phase
>>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>>> over all processors)
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>> Event Count Time (sec) Flops
>>> --- Global --- --- Stage --- Total
>>> Max Ratio Max Ratio Max Ratio Mess Avg len
>>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> VecDot 42 1.0 2.4080e-05 1.0 8.53e+03 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 354
>>> VecTDot 74012 1.0 1.2440e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 3 0 0 0 1 3 0 0 0 3388
>>> VecNorm 37020 1.0 8.3580e-01 1.0 2.11e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 1 0 0 0 1 1 0 0 0 2523
>>> VecScale 37008 1.0 3.5800e-01 1.0 1.05e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 2944
>>> VecCopy 37034 1.0 2.5754e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecSet 74137 1.0 3.0537e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecAXPY 74029 1.0 1.7233e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 2 3 0 0 0 2 3 0 0 0 2446
>>> VecAYPX 37001 1.0 1.2214e+00 1.0 2.11e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 1 0 0 0 1 1 0 0 0 1725
>>> VecAssemblyBegin 68 1.0 2.0432e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecAssemblyEnd 68 1.0 2.5988e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecScatterBegin 48 1.0 4.6921e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatMult 37017 1.0 4.1269e+01 1.0 7.65e+10 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 45 47 0 0 0 45 47 0 0 0 1853
>>> MatMultAdd 37015 1.0 3.3638e+01 1.0 7.53e+10 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 37 46 0 0 0 37 46 0 0 0 2238
>>> MatSolve 74021 1.0 4.6602e+01 1.0 7.42e+10 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 51 45 0 0 0 51 45 0 0 0 1593
>>> MatLUFactorNum 1 1.0 1.7209e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1420
>>> MatCholFctrSym 1 1.0 8.8310e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatCholFctrNum 1 1.0 3.6907e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatILUFactorSym 1 1.0 3.7372e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatAssemblyBegin 29 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatAssemblyEnd 29 1.0 9.9473e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetRow 58026 1.0 2.8155e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetRowIJ 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetSubMatrice 6 1.0 1.5399e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetOrdering 2 1.0 3.0112e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatZeroEntries 6 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatView 7 1.0 3.4356e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> KSPSetUp 4 1.0 9.4891e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> KSPSolve 1 1.0 8.8793e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 97100 0 0 0 97100 0 0 0 1840
>>> PCSetUp 4 1.0 3.8375e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 637
>>> PCSetUpOnBlocks 5 1.0 2.1250e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1150
>>> PCApply 5 1.0 8.8789e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 97100 0 0 0 97100 0 0 0 1840
>>> KSPSolve_FS_0 5 1.0 7.5364e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> KSPSolve_FS_Schu 5 1.0 8.8785e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 97100 0 0 0 97100 0 0 0 1840
>>> KSPSolve_FS_Low 5 1.0 2.1019e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>>
>>> Memory usage is given in bytes:
>>>
>>> Object Type Creations Destructions Memory Descendants'
>>> Mem.
>>> Reports information only for process 0.
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> Vector 91 91 9693912 0.
>>> Vector Scatter 24 24 15936 0.
>>> Index Set 51 51 537888 0.
>>> IS L to G Mapping 3 3 240408 0.
>>> Matrix 13 13 64097868 0.
>>> Krylov Solver 6 6 7888 0.
>>> Preconditioner 6 6 6288 0.
>>> Viewer 1 0 0 0.
>>> Distributed Mesh 1 1 4624 0.
>>> Star Forest Bipartite Graph 2 2 1616 0.
>>> Discrete System 1 1 872 0.
>>> ============================================================
>>> ============================================================
>>> Average time to get PetscTime(): 0.
>>> #PETSc Option Table entries:
>>> -ksp_monitor
>>> -ksp_view
>>> -log_view
>>> #End of PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>> Configure options: --with-shared-libraries=1 --with-debugging=0
>>> --download-suitesparse --download-blacs --download-ptscotch=yes
>>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl
>>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps
>>> --download-metis --prefix=/home/dknez/software/libmesh_install/opt_real/petsc
>>> --download-hypre --download-ml
>>> -----------------------------------------
>>> Libraries compiled on Wed Sep 21 17:38:52 2016 on david-Lenovo
>>> Machine characteristics: Linux-4.4.0-38-generic-x86_64-
>>> with-Ubuntu-16.04-xenial
>>> Using PETSc directory: /home/dknez/software/petsc-src
>>> Using PETSc arch: arch-linux2-c-opt
>>> -----------------------------------------
>>>
>>> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings
>>> -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O
>>> ${COPTFLAGS} ${CFLAGS}
>>> Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0
>>> -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS}
>>> -----------------------------------------
>>>
>>> Using include paths: -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include
>>> -I/home/dknez/software/petsc-src/include -I/home/dknez/software/petsc-src/include
>>> -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include
>>> -I/home/dknez/software/libmesh_install/opt_real/petsc/include
>>> -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent
>>> -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include
>>> -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
>>> -----------------------------------------
>>>
>>> Using C linker: mpicc
>>> Using Fortran linker: mpif90
>>> Using libraries: -Wl,-rpath,/home/dknez/softwar
>>> e/petsc-src/arch-linux2-c-opt/lib -L/home/dknez/software/petsc-src/arch-linux2-c-opt/lib
>>> -lpetsc -Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib
>>> -L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lcmumps
>>> -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lmetis -lHYPRE
>>> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
>>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5
>>> -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu
>>> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu
>>> -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lscalapack -lml -lmpi_cxx
>>> -lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd
>>> -lsuitesparseconfig -Wl,-rpath,/opt/intel/system_s
>>> tudio_2015.2.050/mkl/lib/intel64 -L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64
>>> -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lhwloc
>>> -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lX11 -lm
>>> -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm
>>> -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lpthread -lz
>>> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
>>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5
>>> -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu
>>> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu
>>> -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu
>>> -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/usr/lib/openmpi/lib -lmpi
>>> -lgcc_s -lpthread -ldl
>>> -----------------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170111/cf233480/attachment-0001.html>
More information about the petsc-users
mailing list