[petsc-users] Using PCFIELDSPLIT with -pc_fieldsplit_type schur
Barry Smith
bsmith at mcs.anl.gov
Wed Jan 11 19:32:22 CST 2017
Can you please run with all the monitoring on? So we can see the convergence of all the inner solvers
-fieldsplit_FE_split_ksp_monitor
Then run again with
-fieldsplit_FE_split_ksp_monitor -fieldsplit_FE_split_pc_type cholesky
and send both sets of results
Barry
> On Jan 11, 2017, at 6:32 PM, David Knezevic <david.knezevic at akselos.com> wrote:
>
> On Wed, Jan 11, 2017 at 5:52 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
> so I gather that I'll have to look into a user-defined approximation to S.
>
> Where does the 2x2 block system come from?
> Maybe someone on the list knows the right approximation to use for S.
>
> The model is 3D linear elasticity using a finite element discretization. I applied substructuring to part of the system to "condense" it, and that results in the small A00 block. The A11 block is just standard 3D elasticity; no substructuring was applied there. There are constraints to connect the degrees of freedom on the interface of the substructured and non-substructured regions.
>
> If anyone has suggestions for a good way to precondition this type of system, I'd be most appreciative!
>
> Thanks,
> David
>
>
>
> -----------------------------------------
>
> 0 KSP Residual norm 5.405528187695e+04
> 1 KSP Residual norm 2.187814910803e+02
> 2 KSP Residual norm 1.019051577515e-01
> 3 KSP Residual norm 4.370464012859e-04
> KSP Object: 1 MPI processes
> type: cg
> maximum iterations=1000
> tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
> left preconditioning
> using nonzero initial guess
> using PRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
> type: fieldsplit
> FieldSplit with Schur preconditioner, factorization FULL
> Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses (lumped, if requested) A00's diagonal's inverse
> Split info:
> Split number 0 Defined by IS
> Split number 1 Defined by IS
> KSP solver for A00 block
> KSP Object: (fieldsplit_RB_split_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (fieldsplit_RB_split_) 1 MPI processes
> type: cholesky
> Cholesky: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 0., needed 0.
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=324, cols=324
> package used to perform factorization: mumps
> total: nonzeros=3042, allocated nonzeros=3042
> total number of mallocs used during MatSetValues calls =0
> MUMPS run parameters:
> SYM (matrix type): 2
> PAR (host participation): 1
> ICNTL(1) (output for error): 6
> ICNTL(2) (output of diagnostic msg): 0
> ICNTL(3) (output for global info): 0
> ICNTL(4) (level of printing): 0
> ICNTL(5) (input mat struct): 0
> ICNTL(6) (matrix prescaling): 7
> ICNTL(7) (sequentia matrix ordering):7
> ICNTL(8) (scalling strategy): 77
> ICNTL(10) (max num of refinements): 0
> ICNTL(11) (error analysis): 0
> ICNTL(12) (efficiency control): 0
> ICNTL(13) (efficiency control): 0
> ICNTL(14) (percentage of estimated workspace increase): 20
> ICNTL(18) (input mat struct): 0
> ICNTL(19) (Shur complement info): 0
> ICNTL(20) (rhs sparse pattern): 0
> ICNTL(21) (solution struct): 0
> ICNTL(22) (in-core/out-of-core facility): 0
> ICNTL(23) (max size of memory can be allocated locally):0
> ICNTL(24) (detection of null pivot rows): 0
> ICNTL(25) (computation of a null space basis): 0
> ICNTL(26) (Schur options for rhs or solution): 0
> ICNTL(27) (experimental parameter): -24
> ICNTL(28) (use parallel or sequential ordering): 1
> ICNTL(29) (parallel ordering): 0
> ICNTL(30) (user-specified set of entries in inv(A)): 0
> ICNTL(31) (factors is discarded in the solve phase): 0
> ICNTL(33) (compute determinant): 0
> CNTL(1) (relative pivoting threshold): 0.01
> CNTL(2) (stopping criterion of refinement): 1.49012e-08
> CNTL(3) (absolute pivoting threshold): 0.
> CNTL(4) (value of static pivoting): -1.
> CNTL(5) (fixation for null pivots): 0.
> RINFO(1) (local estimated flops for the elimination after analysis):
> [0] 29394.
> RINFO(2) (local estimated flops for the assembly after factorization):
> [0] 1092.
> RINFO(3) (local estimated flops for the elimination after factorization):
> [0] 29394.
> INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization):
> [0] 1
> INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization):
> [0] 1
> INFO(23) (num of pivots eliminated on this processor after factorization):
> [0] 324
> RINFOG(1) (global estimated flops for the elimination after analysis): 29394.
> RINFOG(2) (global estimated flops for the assembly after factorization): 1092.
> RINFOG(3) (global estimated flops for the elimination after factorization): 29394.
> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
> INFOG(3) (estimated real workspace for factors on all processors after analysis): 3888
> INFOG(4) (estimated integer workspace for factors on all processors after analysis): 2067
> INFOG(5) (estimated maximum front size in the complete tree): 12
> INFOG(6) (number of nodes in the complete tree): 53
> INFOG(7) (ordering option effectively use after analysis): 2
> INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
> INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 3888
> INFOG(10) (total integer space store the matrix factors after factorization): 2067
> INFOG(11) (order of largest frontal matrix after factorization): 12
> INFOG(12) (number of off-diagonal pivots): 0
> INFOG(13) (number of delayed pivots after factorization): 0
> INFOG(14) (number of memory compress after factorization): 0
> INFOG(15) (number of steps of iterative refinement after solution): 0
> INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 1
> INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 1
> INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 1
> INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 1
> INFOG(20) (estimated number of entries in the factors): 3042
> INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 1
> INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 1
> INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5
> INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1
> INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
> INFOG(28) (after factorization: number of null pivots encountered): 0
> INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 3042
> INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0
> INFOG(32) (after analysis: type of analysis done): 1
> INFOG(33) (value used for ICNTL(8)): -2
> INFOG(34) (exponent of the determinant if determinant is requested): 0
> linear system matrix = precond matrix:
> Mat Object: (fieldsplit_RB_split_) 1 MPI processes
> type: seqaij
> rows=324, cols=324
> total: nonzeros=5760, allocated nonzeros=5760
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 108 nodes, limit used is 5
> KSP solver for S = A11 - A10 inv(A00) A01
> KSP Object: (fieldsplit_FE_split_) 1 MPI processes
> type: cg
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object: (fieldsplit_FE_split_) 1 MPI processes
> type: bjacobi
> block Jacobi: number of blocks = 1
> Local solve is same for all blocks, in the following KSP and PC objects:
> KSP Object: (fieldsplit_FE_split_sub_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (fieldsplit_FE_split_sub_) 1 MPI processes
> type: ilu
> ILU: out-of-place factorization
> 0 levels of fill
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 1., needed 1.
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=28476, cols=28476
> package used to perform factorization: petsc
> total: nonzeros=1037052, allocated nonzeros=1037052
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 9489 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=28476, cols=28476
> total: nonzeros=1037052, allocated nonzeros=1037052
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 9489 nodes, limit used is 5
> linear system matrix followed by preconditioner matrix:
> Mat Object: (fieldsplit_FE_split_) 1 MPI processes
> type: schurcomplement
> rows=28476, cols=28476
> Schur complement A11 - A10 inv(A00) A01
> A11
> Mat Object: (fieldsplit_FE_split_) 1 MPI processes
> type: seqaij
> rows=28476, cols=28476
> total: nonzeros=1017054, allocated nonzeros=1017054
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 9492 nodes, limit used is 5
> A10
> Mat Object: 1 MPI processes
> type: seqaij
> rows=28476, cols=324
> total: nonzeros=936, allocated nonzeros=936
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 5717 nodes, limit used is 5
> KSP of A00
> KSP Object: (fieldsplit_RB_split_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (fieldsplit_RB_split_) 1 MPI processes
> type: cholesky
> Cholesky: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 0., needed 0.
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=324, cols=324
> package used to perform factorization: mumps
> total: nonzeros=3042, allocated nonzeros=3042
> total number of mallocs used during MatSetValues calls =0
> MUMPS run parameters:
> SYM (matrix type): 2
> PAR (host participation): 1
> ICNTL(1) (output for error): 6
> ICNTL(2) (output of diagnostic msg): 0
> ICNTL(3) (output for global info): 0
> ICNTL(4) (level of printing): 0
> ICNTL(5) (input mat struct): 0
> ICNTL(6) (matrix prescaling): 7
> ICNTL(7) (sequentia matrix ordering):7
> ICNTL(8) (scalling strategy): 77
> ICNTL(10) (max num of refinements): 0
> ICNTL(11) (error analysis): 0
> ICNTL(12) (efficiency control): 0
> ICNTL(13) (efficiency control): 0
> ICNTL(14) (percentage of estimated workspace increase): 20
> ICNTL(18) (input mat struct): 0
> ICNTL(19) (Shur complement info): 0
> ICNTL(20) (rhs sparse pattern): 0
> ICNTL(21) (solution struct): 0
> ICNTL(22) (in-core/out-of-core facility): 0
> ICNTL(23) (max size of memory can be allocated locally):0
> ICNTL(24) (detection of null pivot rows): 0
> ICNTL(25) (computation of a null space basis): 0
> ICNTL(26) (Schur options for rhs or solution): 0
> ICNTL(27) (experimental parameter): -24
> ICNTL(28) (use parallel or sequential ordering): 1
> ICNTL(29) (parallel ordering): 0
> ICNTL(30) (user-specified set of entries in inv(A)): 0
> ICNTL(31) (factors is discarded in the solve phase): 0
> ICNTL(33) (compute determinant): 0
> CNTL(1) (relative pivoting threshold): 0.01
> CNTL(2) (stopping criterion of refinement): 1.49012e-08
> CNTL(3) (absolute pivoting threshold): 0.
> CNTL(4) (value of static pivoting): -1.
> CNTL(5) (fixation for null pivots): 0.
> RINFO(1) (local estimated flops for the elimination after analysis):
> [0] 29394.
> RINFO(2) (local estimated flops for the assembly after factorization):
> [0] 1092.
> RINFO(3) (local estimated flops for the elimination after factorization):
> [0] 29394.
> INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization):
> [0] 1
> INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization):
> [0] 1
> INFO(23) (num of pivots eliminated on this processor after factorization):
> [0] 324
> RINFOG(1) (global estimated flops for the elimination after analysis): 29394.
> RINFOG(2) (global estimated flops for the assembly after factorization): 1092.
> RINFOG(3) (global estimated flops for the elimination after factorization): 29394.
> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
> INFOG(3) (estimated real workspace for factors on all processors after analysis): 3888
> INFOG(4) (estimated integer workspace for factors on all processors after analysis): 2067
> INFOG(5) (estimated maximum front size in the complete tree): 12
> INFOG(6) (number of nodes in the complete tree): 53
> INFOG(7) (ordering option effectively use after analysis): 2
> INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
> INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 3888
> INFOG(10) (total integer space store the matrix factors after factorization): 2067
> INFOG(11) (order of largest frontal matrix after factorization): 12
> INFOG(12) (number of off-diagonal pivots): 0
> INFOG(13) (number of delayed pivots after factorization): 0
> INFOG(14) (number of memory compress after factorization): 0
> INFOG(15) (number of steps of iterative refinement after solution): 0
> INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 1
> INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 1
> INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 1
> INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 1
> INFOG(20) (estimated number of entries in the factors): 3042
> INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 1
> INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 1
> INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5
> INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1
> INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
> INFOG(28) (after factorization: number of null pivots encountered): 0
> INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 3042
> INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0
> INFOG(32) (after analysis: type of analysis done): 1
> INFOG(33) (value used for ICNTL(8)): -2
> INFOG(34) (exponent of the determinant if determinant is requested): 0
> linear system matrix = precond matrix:
> Mat Object: (fieldsplit_RB_split_) 1 MPI processes
> type: seqaij
> rows=324, cols=324
> total: nonzeros=5760, allocated nonzeros=5760
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 108 nodes, limit used is 5
> A01
> Mat Object: 1 MPI processes
> type: seqaij
> rows=324, cols=28476
> total: nonzeros=936, allocated nonzeros=936
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 67 nodes, limit used is 5
> Mat Object: 1 MPI processes
> type: seqaij
> rows=28476, cols=28476
> total: nonzeros=1037052, allocated nonzeros=1037052
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 9489 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: () 1 MPI processes
> type: seqaij
> rows=28800, cols=28800
> total: nonzeros=1024686, allocated nonzeros=1024794
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 9600 nodes, limit used is 5
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Wed Jan 11 17:22:10 2017
> Using Petsc Release Version 3.7.3, unknown
>
> Max Max/Min Avg Total
> Time (sec): 9.638e+01 1.00000 9.638e+01
> Objects: 2.030e+02 1.00000 2.030e+02
> Flops: 1.732e+11 1.00000 1.732e+11 1.732e+11
> Flops/sec: 1.797e+09 1.00000 1.797e+09 1.797e+09
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 9.6379e+01 100.0% 1.7318e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecDot 42 1.0 2.2411e-05 1.0 8.53e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 380
> VecTDot 77761 1.0 1.4294e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 3098
> VecNorm 38894 1.0 9.1002e-01 1.0 2.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2434
> VecScale 38882 1.0 3.7314e-01 1.0 1.11e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2967
> VecCopy 38908 1.0 2.1655e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 77887 1.0 3.2034e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 77777 1.0 1.8382e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 2409
> VecAYPX 38875 1.0 1.2884e+00 1.0 2.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1718
> VecAssemblyBegin 68 1.0 1.9407e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 68 1.0 2.6941e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 48 1.0 4.6349e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatMult 38891 1.0 4.3045e+01 1.0 8.03e+10 1.0 0.0e+00 0.0e+00 0.0e+00 45 46 0 0 0 45 46 0 0 0 1866
> MatMultAdd 38889 1.0 3.5360e+01 1.0 7.91e+10 1.0 0.0e+00 0.0e+00 0.0e+00 37 46 0 0 0 37 46 0 0 0 2236
> MatSolve 77769 1.0 4.8780e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 0.0e+00 51 46 0 0 0 51 46 0 0 0 1631
> MatLUFactorNum 1 1.0 1.9575e-02 1.0 2.49e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1274
> MatCholFctrSym 1 1.0 9.4891e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatCholFctrNum 1 1.0 3.7885e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatILUFactorSym 1 1.0 4.1780e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatConvert 1 1.0 3.0041e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatScale 2 1.0 2.7180e-05 1.0 2.53e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 930
> MatAssemblyBegin 32 1.0 4.0531e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 32 1.0 1.2032e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRow 114978 1.0 5.9254e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetSubMatrice 6 1.0 1.5707e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 2 1.0 3.2425e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 6 1.0 3.0580e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatView 7 1.0 3.5119e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAXPY 1 1.0 1.9384e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatMatMult 1 1.0 2.7120e-03 1.0 3.16e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 117
> MatMatMultSym 1 1.0 1.8010e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatMatMultNum 1 1.0 6.1703e-04 1.0 3.16e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 513
> KSPSetUp 4 1.0 9.8944e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 9.3380e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 1855
> PCSetUp 4 1.0 6.6326e-02 1.0 2.53e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 381
> PCSetUpOnBlocks 5 1.0 2.4082e-02 1.0 2.49e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1036
> PCApply 5 1.0 9.3376e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 1855
> KSPSolve_FS_0 5 1.0 7.0214e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve_FS_Schu 5 1.0 9.3372e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 1855
> KSPSolve_FS_Low 5 1.0 2.1377e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Vector 92 92 9698040 0.
> Vector Scatter 24 24 15936 0.
> Index Set 51 51 537876 0.
> IS L to G Mapping 3 3 240408 0.
> Matrix 16 16 77377776 0.
> Krylov Solver 6 6 7888 0.
> Preconditioner 6 6 6288 0.
> Viewer 1 0 0 0.
> Distributed Mesh 1 1 4624 0.
> Star Forest Bipartite Graph 2 2 1616 0.
> Discrete System 1 1 872 0.
> ========================================================================================================================
> Average time to get PetscTime(): 0.
> #PETSc Option Table entries:
> -ksp_monitor
> -ksp_view
> -log_view
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --with-shared-libraries=1 --with-debugging=0 --download-suitesparse --download-blacs --download-ptscotch=yes --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps --download-metis --prefix=/home/dknez/software/libmesh_install/opt_real/petsc --download-hypre --download-ml
> -----------------------------------------
> Libraries compiled on Wed Sep 21 17:38:52 2016 on david-Lenovo
> Machine characteristics: Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial
> Using PETSc directory: /home/dknez/software/petsc-src
> Using PETSc arch: arch-linux2-c-opt
> -----------------------------------------
>
> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include -I/home/dknez/software/petsc-src/include -I/home/dknez/software/petsc-src/include -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include -I/home/dknez/software/libmesh_install/opt_real/petsc/include -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpif90
> Using libraries: -Wl,-rpath,/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -L/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib -L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lmetis -lHYPRE -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lscalapack -lml -lmpi_cxx -lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lhwloc -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lX11 -lm -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lpthread -lz -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/usr/lib/openmpi/lib -lmpi -lgcc_s -lpthread -ldl
> -----------------------------------------
>
>
>
>
> On Wed, Jan 11, 2017 at 4:49 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
> It looks like the Schur solve is requiring a huge number of iterates to converge (based on the instances of MatMult).
> This is killing the performance.
>
> Are you sure that A11 is a good approximation to S? You might consider trying the selfp option
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurPre.html#PCFieldSplitSetSchurPre
>
> Note that the best approx to S is likely both problem and discretisation dependent so if selfp is also terrible, you might want to consider coding up your own approx to S for your specific system.
>
>
> Thanks,
> Dave
>
>
> On Wed, 11 Jan 2017 at 22:34, David Knezevic <david.knezevic at akselos.com> wrote:
> I have a definite block 2x2 system and I figured it'd be good to apply the PCFIELDSPLIT functionality with Schur complement, as described in Section 4.5 of the manual.
>
> The A00 block of my matrix is very small so I figured I'd specify a direct solver (i.e. MUMPS) for that block.
>
> So I did the following:
> - PCFieldSplitSetIS to specify the indices of the two splits
> - PCFieldSplitGetSubKSP to get the two KSP objects, and to set the solver and PC types for each (MUMPS for A00, ILU+CG for A11)
> - I set -pc_fieldsplit_schur_fact_type full
>
> Below I have pasted the output of "-ksp_view -ksp_monitor -log_view" for a test case. It seems to converge well, but I'm concerned about the speed (about 90 seconds, vs. about 1 second if I use a direct solver for the entire system). I just wanted to check if I'm setting this up in a good way?
>
> Many thanks,
> David
>
> -----------------------------------------------------------------------------------
>
> 0 KSP Residual norm 5.405774214400e+04
> 1 KSP Residual norm 1.849649014371e+02
> 2 KSP Residual norm 7.462775074989e-02
> 3 KSP Residual norm 2.680497175260e-04
> KSP Object: 1 MPI processes
> type: cg
> maximum iterations=1000
> tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
> left preconditioning
> using nonzero initial guess
> using PRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
> type: fieldsplit
> FieldSplit with Schur preconditioner, factorization FULL
> Preconditioner for the Schur complement formed from A11
> Split info:
> Split number 0 Defined by IS
> Split number 1 Defined by IS
> KSP solver for A00 block
> KSP Object: (fieldsplit_RB_split_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (fieldsplit_RB_split_) 1 MPI processes
> type: cholesky
> Cholesky: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 0., needed 0.
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=324, cols=324
> package used to perform factorization: mumps
> total: nonzeros=3042, allocated nonzeros=3042
> total number of mallocs used during MatSetValues calls =0
> MUMPS run parameters:
> SYM (matrix type): 2
> PAR (host participation): 1
> ICNTL(1) (output for error): 6
> ICNTL(2) (output of diagnostic msg): 0
> ICNTL(3) (output for global info): 0
> ICNTL(4) (level of printing): 0
> ICNTL(5) (input mat struct): 0
> ICNTL(6) (matrix prescaling): 7
> ICNTL(7) (sequentia matrix ordering):7
> ICNTL(8) (scalling strategy): 77
> ICNTL(10) (max num of refinements): 0
> ICNTL(11) (error analysis): 0
> ICNTL(12) (efficiency control): 0
> ICNTL(13) (efficiency control): 0
> ICNTL(14) (percentage of estimated workspace increase): 20
> ICNTL(18) (input mat struct): 0
> ICNTL(19) (Shur complement info): 0
> ICNTL(20) (rhs sparse pattern): 0
> ICNTL(21) (solution struct): 0
> ICNTL(22) (in-core/out-of-core facility): 0
> ICNTL(23) (max size of memory can be allocated locally):0
> ICNTL(24) (detection of null pivot rows): 0
> ICNTL(25) (computation of a null space basis): 0
> ICNTL(26) (Schur options for rhs or solution): 0
> ICNTL(27) (experimental parameter): -24
> ICNTL(28) (use parallel or sequential ordering): 1
> ICNTL(29) (parallel ordering): 0
> ICNTL(30) (user-specified set of entries in inv(A)): 0
> ICNTL(31) (factors is discarded in the solve phase): 0
> ICNTL(33) (compute determinant): 0
> CNTL(1) (relative pivoting threshold): 0.01
> CNTL(2) (stopping criterion of refinement): 1.49012e-08
> CNTL(3) (absolute pivoting threshold): 0.
> CNTL(4) (value of static pivoting): -1.
> CNTL(5) (fixation for null pivots): 0.
> RINFO(1) (local estimated flops for the elimination after analysis):
> [0] 29394.
> RINFO(2) (local estimated flops for the assembly after factorization):
> [0] 1092.
> RINFO(3) (local estimated flops for the elimination after factorization):
> [0] 29394.
> INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization):
> [0] 1
> INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization):
> [0] 1
> INFO(23) (num of pivots eliminated on this processor after factorization):
> [0] 324
> RINFOG(1) (global estimated flops for the elimination after analysis): 29394.
> RINFOG(2) (global estimated flops for the assembly after factorization): 1092.
> RINFOG(3) (global estimated flops for the elimination after factorization): 29394.
> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
> INFOG(3) (estimated real workspace for factors on all processors after analysis): 3888
> INFOG(4) (estimated integer workspace for factors on all processors after analysis): 2067
> INFOG(5) (estimated maximum front size in the complete tree): 12
> INFOG(6) (number of nodes in the complete tree): 53
> INFOG(7) (ordering option effectively use after analysis): 2
> INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
> INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 3888
> INFOG(10) (total integer space store the matrix factors after factorization): 2067
> INFOG(11) (order of largest frontal matrix after factorization): 12
> INFOG(12) (number of off-diagonal pivots): 0
> INFOG(13) (number of delayed pivots after factorization): 0
> INFOG(14) (number of memory compress after factorization): 0
> INFOG(15) (number of steps of iterative refinement after solution): 0
> INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 1
> INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 1
> INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 1
> INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 1
> INFOG(20) (estimated number of entries in the factors): 3042
> INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 1
> INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 1
> INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5
> INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1
> INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
> INFOG(28) (after factorization: number of null pivots encountered): 0
> INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 3042
> INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0
> INFOG(32) (after analysis: type of analysis done): 1
> INFOG(33) (value used for ICNTL(8)): -2
> INFOG(34) (exponent of the determinant if determinant is requested): 0
> linear system matrix = precond matrix:
> Mat Object: (fieldsplit_RB_split_) 1 MPI processes
> type: seqaij
> rows=324, cols=324
> total: nonzeros=5760, allocated nonzeros=5760
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 108 nodes, limit used is 5
> KSP solver for S = A11 - A10 inv(A00) A01
> KSP Object: (fieldsplit_FE_split_) 1 MPI processes
> type: cg
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object: (fieldsplit_FE_split_) 1 MPI processes
> type: bjacobi
> block Jacobi: number of blocks = 1
> Local solve is same for all blocks, in the following KSP and PC objects:
> KSP Object: (fieldsplit_FE_split_sub_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (fieldsplit_FE_split_sub_) 1 MPI processes
> type: ilu
> ILU: out-of-place factorization
> 0 levels of fill
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 1., needed 1.
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=28476, cols=28476
> package used to perform factorization: petsc
> total: nonzeros=1017054, allocated nonzeros=1017054
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 9492 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: (fieldsplit_FE_split_) 1 MPI processes
> type: seqaij
> rows=28476, cols=28476
> total: nonzeros=1017054, allocated nonzeros=1017054
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 9492 nodes, limit used is 5
> linear system matrix followed by preconditioner matrix:
> Mat Object: (fieldsplit_FE_split_) 1 MPI processes
> type: schurcomplement
> rows=28476, cols=28476
> Schur complement A11 - A10 inv(A00) A01
> A11
> Mat Object: (fieldsplit_FE_split_) 1 MPI processes
> type: seqaij
> rows=28476, cols=28476
> total: nonzeros=1017054, allocated nonzeros=1017054
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 9492 nodes, limit used is 5
> A10
> Mat Object: 1 MPI processes
> type: seqaij
> rows=28476, cols=324
> total: nonzeros=936, allocated nonzeros=936
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 5717 nodes, limit used is 5
> KSP of A00
> KSP Object: (fieldsplit_RB_split_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (fieldsplit_RB_split_) 1 MPI processes
> type: cholesky
> Cholesky: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 0., needed 0.
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=324, cols=324
> package used to perform factorization: mumps
> total: nonzeros=3042, allocated nonzeros=3042
> total number of mallocs used during MatSetValues calls =0
> MUMPS run parameters:
> SYM (matrix type): 2
> PAR (host participation): 1
> ICNTL(1) (output for error): 6
> ICNTL(2) (output of diagnostic msg): 0
> ICNTL(3) (output for global info): 0
> ICNTL(4) (level of printing): 0
> ICNTL(5) (input mat struct): 0
> ICNTL(6) (matrix prescaling): 7
> ICNTL(7) (sequentia matrix ordering):7
> ICNTL(8) (scalling strategy): 77
> ICNTL(10) (max num of refinements): 0
> ICNTL(11) (error analysis): 0
> ICNTL(12) (efficiency control): 0
> ICNTL(13) (efficiency control): 0
> ICNTL(14) (percentage of estimated workspace increase): 20
> ICNTL(18) (input mat struct): 0
> ICNTL(19) (Shur complement info): 0
> ICNTL(20) (rhs sparse pattern): 0
> ICNTL(21) (solution struct): 0
> ICNTL(22) (in-core/out-of-core facility): 0
> ICNTL(23) (max size of memory can be allocated locally):0
> ICNTL(24) (detection of null pivot rows): 0
> ICNTL(25) (computation of a null space basis): 0
> ICNTL(26) (Schur options for rhs or solution): 0
> ICNTL(27) (experimental parameter): -24
> ICNTL(28) (use parallel or sequential ordering): 1
> ICNTL(29) (parallel ordering): 0
> ICNTL(30) (user-specified set of entries in inv(A)): 0
> ICNTL(31) (factors is discarded in the solve phase): 0
> ICNTL(33) (compute determinant): 0
> CNTL(1) (relative pivoting threshold): 0.01
> CNTL(2) (stopping criterion of refinement): 1.49012e-08
> CNTL(3) (absolute pivoting threshold): 0.
> CNTL(4) (value of static pivoting): -1.
> CNTL(5) (fixation for null pivots): 0.
> RINFO(1) (local estimated flops for the elimination after analysis):
> [0] 29394.
> RINFO(2) (local estimated flops for the assembly after factorization):
> [0] 1092.
> RINFO(3) (local estimated flops for the elimination after factorization):
> [0] 29394.
> INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization):
> [0] 1
> INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization):
> [0] 1
> INFO(23) (num of pivots eliminated on this processor after factorization):
> [0] 324
> RINFOG(1) (global estimated flops for the elimination after analysis): 29394.
> RINFOG(2) (global estimated flops for the assembly after factorization): 1092.
> RINFOG(3) (global estimated flops for the elimination after factorization): 29394.
> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
> INFOG(3) (estimated real workspace for factors on all processors after analysis): 3888
> INFOG(4) (estimated integer workspace for factors on all processors after analysis): 2067
> INFOG(5) (estimated maximum front size in the complete tree): 12
> INFOG(6) (number of nodes in the complete tree): 53
> INFOG(7) (ordering option effectively use after analysis): 2
> INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
> INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 3888
> INFOG(10) (total integer space store the matrix factors after factorization): 2067
> INFOG(11) (order of largest frontal matrix after factorization): 12
> INFOG(12) (number of off-diagonal pivots): 0
> INFOG(13) (number of delayed pivots after factorization): 0
> INFOG(14) (number of memory compress after factorization): 0
> INFOG(15) (number of steps of iterative refinement after solution): 0
> INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 1
> INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 1
> INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 1
> INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 1
> INFOG(20) (estimated number of entries in the factors): 3042
> INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 1
> INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 1
> INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5
> INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1
> INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
> INFOG(28) (after factorization: number of null pivots encountered): 0
> INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 3042
> INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0
> INFOG(32) (after analysis: type of analysis done): 1
> INFOG(33) (value used for ICNTL(8)): -2
> INFOG(34) (exponent of the determinant if determinant is requested): 0
> linear system matrix = precond matrix:
> Mat Object: (fieldsplit_RB_split_) 1 MPI processes
> type: seqaij
> rows=324, cols=324
> total: nonzeros=5760, allocated nonzeros=5760
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 108 nodes, limit used is 5
> A01
> Mat Object: 1 MPI processes
> type: seqaij
> rows=324, cols=28476
> total: nonzeros=936, allocated nonzeros=936
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 67 nodes, limit used is 5
> Mat Object: (fieldsplit_FE_split_) 1 MPI processes
> type: seqaij
> rows=28476, cols=28476
> total: nonzeros=1017054, allocated nonzeros=1017054
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 9492 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: () 1 MPI processes
> type: seqaij
> rows=28800, cols=28800
> total: nonzeros=1024686, allocated nonzeros=1024794
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 9600 nodes, limit used is 5
>
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Wed Jan 11 16:16:47 2017
> Using Petsc Release Version 3.7.3, unknown
>
> Max Max/Min Avg Total
> Time (sec): 9.179e+01 1.00000 9.179e+01
> Objects: 1.990e+02 1.00000 1.990e+02
> Flops: 1.634e+11 1.00000 1.634e+11 1.634e+11
> Flops/sec: 1.780e+09 1.00000 1.780e+09 1.780e+09
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 9.1787e+01 100.0% 1.6336e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecDot 42 1.0 2.4080e-05 1.0 8.53e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 354
> VecTDot 74012 1.0 1.2440e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 3388
> VecNorm 37020 1.0 8.3580e-01 1.0 2.11e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2523
> VecScale 37008 1.0 3.5800e-01 1.0 1.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2944
> VecCopy 37034 1.0 2.5754e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 74137 1.0 3.0537e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 74029 1.0 1.7233e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 2446
> VecAYPX 37001 1.0 1.2214e+00 1.0 2.11e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1725
> VecAssemblyBegin 68 1.0 2.0432e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 68 1.0 2.5988e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 48 1.0 4.6921e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatMult 37017 1.0 4.1269e+01 1.0 7.65e+10 1.0 0.0e+00 0.0e+00 0.0e+00 45 47 0 0 0 45 47 0 0 0 1853
> MatMultAdd 37015 1.0 3.3638e+01 1.0 7.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 37 46 0 0 0 37 46 0 0 0 2238
> MatSolve 74021 1.0 4.6602e+01 1.0 7.42e+10 1.0 0.0e+00 0.0e+00 0.0e+00 51 45 0 0 0 51 45 0 0 0 1593
> MatLUFactorNum 1 1.0 1.7209e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1420
> MatCholFctrSym 1 1.0 8.8310e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatCholFctrNum 1 1.0 3.6907e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatILUFactorSym 1 1.0 3.7372e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 29 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 29 1.0 9.9473e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRow 58026 1.0 2.8155e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetSubMatrice 6 1.0 1.5399e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 2 1.0 3.0112e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 6 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatView 7 1.0 3.4356e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetUp 4 1.0 9.4891e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 8.8793e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 1840
> PCSetUp 4 1.0 3.8375e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 637
> PCSetUpOnBlocks 5 1.0 2.1250e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1150
> PCApply 5 1.0 8.8789e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 1840
> KSPSolve_FS_0 5 1.0 7.5364e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve_FS_Schu 5 1.0 8.8785e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 1840
> KSPSolve_FS_Low 5 1.0 2.1019e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Vector 91 91 9693912 0.
> Vector Scatter 24 24 15936 0.
> Index Set 51 51 537888 0.
> IS L to G Mapping 3 3 240408 0.
> Matrix 13 13 64097868 0.
> Krylov Solver 6 6 7888 0.
> Preconditioner 6 6 6288 0.
> Viewer 1 0 0 0.
> Distributed Mesh 1 1 4624 0.
> Star Forest Bipartite Graph 2 2 1616 0.
> Discrete System 1 1 872 0.
> ========================================================================================================================
> Average time to get PetscTime(): 0.
> #PETSc Option Table entries:
> -ksp_monitor
> -ksp_view
> -log_view
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --with-shared-libraries=1 --with-debugging=0 --download-suitesparse --download-blacs --download-ptscotch=yes --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps --download-metis --prefix=/home/dknez/software/libmesh_install/opt_real/petsc --download-hypre --download-ml
> -----------------------------------------
> Libraries compiled on Wed Sep 21 17:38:52 2016 on david-Lenovo
> Machine characteristics: Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial
> Using PETSc directory: /home/dknez/software/petsc-src
> Using PETSc arch: arch-linux2-c-opt
> -----------------------------------------
>
> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include -I/home/dknez/software/petsc-src/include -I/home/dknez/software/petsc-src/include -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include -I/home/dknez/software/libmesh_install/opt_real/petsc/include -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpif90
> Using libraries: -Wl,-rpath,/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -L/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib -L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lmetis -lHYPRE -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lscalapack -lml -lmpi_cxx -lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lhwloc -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lX11 -lm -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lpthread -lz -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/usr/lib/openmpi/lib -lmpi -lgcc_s -lpthread -ldl
> -----------------------------------------
>
>
>
>
>
>
>
More information about the petsc-users
mailing list