[petsc-users] Using PCFIELDSPLIT with -pc_fieldsplit_type schur

Wed Jan 11 16:29:02 CST 2017

Thanks very much for the input. I tried with "selfp" and it's about the
same (log below), so I gather that I'll have to look into a user-defined
approximation to S.

Thanks,
David

-----------------------------------------

  0 KSP Residual norm 5.405528187695e+04
  1 KSP Residual norm 2.187814910803e+02
  2 KSP Residual norm 1.019051577515e-01
  3 KSP Residual norm 4.370464012859e-04
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=1000
  tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: fieldsplit
    FieldSplit with Schur preconditioner, factorization FULL
    Preconditioner for the Schur complement formed from Sp, an assembled
approximation to S, which uses (lumped, if requested) A00's diagonal's
inverse
    Split info:
    Split number 0 Defined by IS
    Split number 1 Defined by IS
    KSP solver for A00 block
      KSP Object:      (fieldsplit_RB_split_)       1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (fieldsplit_RB_split_)       1 MPI processes
        type: cholesky
          Cholesky: out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          matrix ordering: natural
          factor fill ratio given 0., needed 0.
            Factored matrix follows:
              Mat Object:               1 MPI processes
                type: seqaij
                rows=324, cols=324
                package used to perform factorization: mumps
                total: nonzeros=3042, allocated nonzeros=3042
                total number of mallocs used during MatSetValues calls =0
                  MUMPS run parameters:
                    SYM (matrix type):                   2
                    PAR (host participation):            1
                    ICNTL(1) (output for error):         6
                    ICNTL(2) (output of diagnostic msg): 0
                    ICNTL(3) (output for global info):   0
                    ICNTL(4) (level of printing):        0
                    ICNTL(5) (input mat struct):         0
                    ICNTL(6) (matrix prescaling):        7
                    ICNTL(7) (sequentia matrix ordering):7
                    ICNTL(8) (scalling strategy):        77
                    ICNTL(10) (max num of refinements):  0
                    ICNTL(11) (error analysis):          0
                    ICNTL(12) (efficiency control):
0
                    ICNTL(13) (efficiency control):
0
                    ICNTL(14) (percentage of estimated workspace increase):
20
                    ICNTL(18) (input mat struct):
0
                    ICNTL(19) (Shur complement info):
0
                    ICNTL(20) (rhs sparse pattern):
0
                    ICNTL(21) (solution struct):
 0
                    ICNTL(22) (in-core/out-of-core facility):
0
                    ICNTL(23) (max size of memory can be allocated
locally):0
                    ICNTL(24) (detection of null pivot rows):
0
                    ICNTL(25) (computation of a null space basis):
 0
                    ICNTL(26) (Schur options for rhs or solution):
 0
                    ICNTL(27) (experimental parameter):
-24
                    ICNTL(28) (use parallel or sequential ordering):
 1
                    ICNTL(29) (parallel ordering):
 0
                    ICNTL(30) (user-specified set of entries in inv(A)):
 0
                    ICNTL(31) (factors is discarded in the solve phase):
 0
                    ICNTL(33) (compute determinant):
 0
                    CNTL(1) (relative pivoting threshold):      0.01
                    CNTL(2) (stopping criterion of refinement): 1.49012e-08
                    CNTL(3) (absolute pivoting threshold):      0.
                    CNTL(4) (value of static pivoting):         -1.
                    CNTL(5) (fixation for null pivots):         0.
                    RINFO(1) (local estimated flops for the elimination
after analysis):
                      [0] 29394.
                    RINFO(2) (local estimated flops for the assembly after
factorization):
                      [0]  1092.
                    RINFO(3) (local estimated flops for the elimination
after factorization):
                      [0]  29394.
                    INFO(15) (estimated size of (in MB) MUMPS internal data
for running numerical factorization):
                    [0] 1
                    INFO(16) (size of (in MB) MUMPS internal data used
during numerical factorization):
                      [0] 1
                    INFO(23) (num of pivots eliminated on this processor
after factorization):
                      [0] 324
                    RINFOG(1) (global estimated flops for the elimination
after analysis): 29394.
                    RINFOG(2) (global estimated flops for the assembly
after factorization): 1092.
                    RINFOG(3) (global estimated flops for the elimination
after factorization): 29394.
                    (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
(0.,0.)*(2^0)
                    INFOG(3) (estimated real workspace for factors on all
processors after analysis): 3888
                    INFOG(4) (estimated integer workspace for factors on
all processors after analysis): 2067
                    INFOG(5) (estimated maximum front size in the complete
tree): 12
                    INFOG(6) (number of nodes in the complete tree): 53
                    INFOG(7) (ordering option effectively use after
analysis): 2
                    INFOG(8) (structural symmetry in percent of the
permuted matrix after analysis): 100
                    INFOG(9) (total real/complex workspace to store the
matrix factors after factorization): 3888
                    INFOG(10) (total integer space store the matrix factors
after factorization): 2067
                    INFOG(11) (order of largest frontal matrix after
factorization): 12
                    INFOG(12) (number of off-diagonal pivots): 0
                    INFOG(13) (number of delayed pivots after
factorization): 0
                    INFOG(14) (number of memory compress after
factorization): 0
                    INFOG(15) (number of steps of iterative refinement
after solution): 0
                    INFOG(16) (estimated size (in MB) of all MUMPS internal
data for factorization after analysis: value on the most memory consuming
processor): 1
                    INFOG(17) (estimated size of all MUMPS internal data
for factorization after analysis: sum over all processors): 1
                    INFOG(18) (size of all MUMPS internal data allocated
during factorization: value on the most memory consuming processor): 1
                    INFOG(19) (size of all MUMPS internal data allocated
during factorization: sum over all processors): 1
                    INFOG(20) (estimated number of entries in the factors):
3042
                    INFOG(21) (size in MB of memory effectively used during
factorization - value on the most memory consuming processor): 1
                    INFOG(22) (size in MB of memory effectively used during
factorization - sum over all processors): 1
                    INFOG(23) (after analysis: value of ICNTL(6)
effectively used): 5
                    INFOG(24) (after analysis: value of ICNTL(12)
effectively used): 1
                    INFOG(25) (after factorization: number of pivots
modified by static pivoting): 0
                    INFOG(28) (after factorization: number of null pivots
encountered): 0
                    INFOG(29) (after factorization: effective number of
entries in the factors (sum over all processors)): 3042
                    INFOG(30, 31) (after solution: size in Mbytes of memory
used during solution phase): 0, 0
                    INFOG(32) (after analysis: type of analysis done): 1
                    INFOG(33) (value used for ICNTL(8)): -2
                    INFOG(34) (exponent of the determinant if determinant
is requested): 0
        linear system matrix = precond matrix:
        Mat Object:        (fieldsplit_RB_split_)         1 MPI processes
          type: seqaij
          rows=324, cols=324
          total: nonzeros=5760, allocated nonzeros=5760
          total number of mallocs used during MatSetValues calls =0
            using I-node routines: found 108 nodes, limit used is 5
    KSP solver for S = A11 - A10 inv(A00) A01
      KSP Object:      (fieldsplit_FE_split_)       1 MPI processes
        type: cg
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using PRECONDITIONED norm type for convergence test
      PC Object:      (fieldsplit_FE_split_)       1 MPI processes
        type: bjacobi
          block Jacobi: number of blocks = 1
          Local solve is same for all blocks, in the following KSP and PC
objects:
          KSP Object:          (fieldsplit_FE_split_sub_)           1 MPI
processes
            type: preonly
            maximum iterations=10000, initial guess is zero
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
            left preconditioning
            using NONE norm type for convergence test
          PC Object:          (fieldsplit_FE_split_sub_)           1 MPI
processes
            type: ilu
              ILU: out-of-place factorization
              0 levels of fill
              tolerance for zero pivot 2.22045e-14
              matrix ordering: natural
              factor fill ratio given 1., needed 1.
                Factored matrix follows:
                  Mat Object:                   1 MPI processes
                    type: seqaij
                    rows=28476, cols=28476
                    package used to perform factorization: petsc
                    total: nonzeros=1037052, allocated nonzeros=1037052
                    total number of mallocs used during MatSetValues calls
=0
                      using I-node routines: found 9489 nodes, limit used
is 5
            linear system matrix = precond matrix:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=28476, cols=28476
              total: nonzeros=1037052, allocated nonzeros=1037052
              total number of mallocs used during MatSetValues calls =0
                using I-node routines: found 9489 nodes, limit used is 5
        linear system matrix followed by preconditioner matrix:
        Mat Object:        (fieldsplit_FE_split_)         1 MPI processes
          type: schurcomplement
          rows=28476, cols=28476
            Schur complement A11 - A10 inv(A00) A01
            A11
              Mat Object:              (fieldsplit_FE_split_)
1 MPI processes
                type: seqaij
                rows=28476, cols=28476
                total: nonzeros=1017054, allocated nonzeros=1017054
                total number of mallocs used during MatSetValues calls =0
                  using I-node routines: found 9492 nodes, limit used is 5
            A10
              Mat Object:               1 MPI processes
                type: seqaij
                rows=28476, cols=324
                total: nonzeros=936, allocated nonzeros=936
                total number of mallocs used during MatSetValues calls =0
                  using I-node routines: found 5717 nodes, limit used is 5
            KSP of A00
              KSP Object:              (fieldsplit_RB_split_)
1 MPI processes
                type: preonly
                maximum iterations=10000, initial guess is zero
                tolerances:  relative=1e-05, absolute=1e-50,
divergence=10000.
                left preconditioning
                using NONE norm type for convergence test
              PC Object:              (fieldsplit_RB_split_)
1 MPI processes
                type: cholesky
                  Cholesky: out-of-place factorization
                  tolerance for zero pivot 2.22045e-14
                  matrix ordering: natural
                  factor fill ratio given 0., needed 0.
                    Factored matrix follows:
                      Mat Object:                       1 MPI processes
                        type: seqaij
                        rows=324, cols=324
                        package used to perform factorization: mumps
                        total: nonzeros=3042, allocated nonzeros=3042
                        total number of mallocs used during MatSetValues
calls =0
                          MUMPS run parameters:
                            SYM (matrix type):                   2
                            PAR (host participation):            1
                            ICNTL(1) (output for error):         6
                            ICNTL(2) (output of diagnostic msg): 0
                            ICNTL(3) (output for global info):   0
                            ICNTL(4) (level of printing):        0
                            ICNTL(5) (input mat struct):         0
                            ICNTL(6) (matrix prescaling):        7
                            ICNTL(7) (sequentia matrix ordering):7
                            ICNTL(8) (scalling strategy):        77
                            ICNTL(10) (max num of refinements):  0
                            ICNTL(11) (error analysis):          0
                            ICNTL(12) (efficiency control):
        0
                            ICNTL(13) (efficiency control):
        0
                            ICNTL(14) (percentage of estimated workspace
increase): 20
                            ICNTL(18) (input mat struct):
        0
                            ICNTL(19) (Shur complement info):
        0
                            ICNTL(20) (rhs sparse pattern):
        0
                            ICNTL(21) (solution struct):
         0
                            ICNTL(22) (in-core/out-of-core facility):
        0
                            ICNTL(23) (max size of memory can be allocated
locally):0
                            ICNTL(24) (detection of null pivot rows):
        0
                            ICNTL(25) (computation of a null space basis):
         0
                            ICNTL(26) (Schur options for rhs or solution):
         0
                            ICNTL(27) (experimental parameter):
        -24
                            ICNTL(28) (use parallel or sequential
ordering):        1
                            ICNTL(29) (parallel ordering):
         0
                            ICNTL(30) (user-specified set of entries in
inv(A)):    0
                            ICNTL(31) (factors is discarded in the solve
phase):    0
                            ICNTL(33) (compute determinant):
         0
                            CNTL(1) (relative pivoting threshold):
 0.01
                            CNTL(2) (stopping criterion of refinement):
1.49012e-08
                            CNTL(3) (absolute pivoting threshold):      0.
                            CNTL(4) (value of static pivoting):         -1.
                            CNTL(5) (fixation for null pivots):         0.
                            RINFO(1) (local estimated flops for the
elimination after analysis):
                              [0] 29394.
                            RINFO(2) (local estimated flops for the
assembly after factorization):
                              [0]  1092.
                            RINFO(3) (local estimated flops for the
elimination after factorization):
                              [0]  29394.
                            INFO(15) (estimated size of (in MB) MUMPS
internal data for running numerical factorization):
                            [0] 1
                            INFO(16) (size of (in MB) MUMPS internal data
used during numerical factorization):
                              [0] 1
                            INFO(23) (num of pivots eliminated on this
processor after factorization):
                              [0] 324
                            RINFOG(1) (global estimated flops for the
elimination after analysis): 29394.
                            RINFOG(2) (global estimated flops for the
assembly after factorization): 1092.
                            RINFOG(3) (global estimated flops for the
elimination after factorization): 29394.
                            (RINFOG(12) RINFOG(13))*2^INFOG(34)
(determinant): (0.,0.)*(2^0)
                            INFOG(3) (estimated real workspace for factors
on all processors after analysis): 3888
                            INFOG(4) (estimated integer workspace for
factors on all processors after analysis): 2067
                            INFOG(5) (estimated maximum front size in the
complete tree): 12
                            INFOG(6) (number of nodes in the complete
tree): 53
                            INFOG(7) (ordering option effectively use after
analysis): 2
                            INFOG(8) (structural symmetry in percent of the
permuted matrix after analysis): 100
                            INFOG(9) (total real/complex workspace to store
the matrix factors after factorization): 3888
                            INFOG(10) (total integer space store the matrix
factors after factorization): 2067
                            INFOG(11) (order of largest frontal matrix
after factorization): 12
                            INFOG(12) (number of off-diagonal pivots): 0
                            INFOG(13) (number of delayed pivots after
factorization): 0
                            INFOG(14) (number of memory compress after
factorization): 0
                            INFOG(15) (number of steps of iterative
refinement after solution): 0
                            INFOG(16) (estimated size (in MB) of all MUMPS
internal data for factorization after analysis: value on the most memory
consuming processor): 1
                            INFOG(17) (estimated size of all MUMPS internal
data for factorization after analysis: sum over all processors): 1
                            INFOG(18) (size of all MUMPS internal data
allocated during factorization: value on the most memory consuming
processor): 1
                            INFOG(19) (size of all MUMPS internal data
allocated during factorization: sum over all processors): 1
                            INFOG(20) (estimated number of entries in the
factors): 3042
                            INFOG(21) (size in MB of memory effectively
used during factorization - value on the most memory consuming processor):
1
                            INFOG(22) (size in MB of memory effectively
used during factorization - sum over all processors): 1
                            INFOG(23) (after analysis: value of ICNTL(6)
effectively used): 5
                            INFOG(24) (after analysis: value of ICNTL(12)
effectively used): 1
                            INFOG(25) (after factorization: number of
pivots modified by static pivoting): 0
                            INFOG(28) (after factorization: number of null
pivots encountered): 0
                            INFOG(29) (after factorization: effective
number of entries in the factors (sum over all processors)): 3042
                            INFOG(30, 31) (after solution: size in Mbytes
of memory used during solution phase): 0, 0
                            INFOG(32) (after analysis: type of analysis
done): 1
                            INFOG(33) (value used for ICNTL(8)): -2
                            INFOG(34) (exponent of the determinant if
determinant is requested): 0
                linear system matrix = precond matrix:
                Mat Object:                (fieldsplit_RB_split_)
      1 MPI processes
                  type: seqaij
                  rows=324, cols=324
                  total: nonzeros=5760, allocated nonzeros=5760
                  total number of mallocs used during MatSetValues calls =0
                    using I-node routines: found 108 nodes, limit used is 5
            A01
              Mat Object:               1 MPI processes
                type: seqaij
                rows=324, cols=28476
                total: nonzeros=936, allocated nonzeros=936
                total number of mallocs used during MatSetValues calls =0
                  using I-node routines: found 67 nodes, limit used is 5
        Mat Object:         1 MPI processes
          type: seqaij
          rows=28476, cols=28476
          total: nonzeros=1037052, allocated nonzeros=1037052
          total number of mallocs used during MatSetValues calls =0
            using I-node routines: found 9489 nodes, limit used is 5
  linear system matrix = precond matrix:
  Mat Object:  ()   1 MPI processes
    type: seqaij
    rows=28800, cols=28800
    total: nonzeros=1024686, allocated nonzeros=1024794
    total number of mallocs used during MatSetValues calls =0
      using I-node routines: found 9600 nodes, limit used is 5

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

/home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a
arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Wed Jan 11
17:22:10 2017
Using Petsc Release Version 3.7.3, unknown

                         Max       Max/Min        Avg      Total
Time (sec):           9.638e+01      1.00000   9.638e+01
Objects:              2.030e+02      1.00000   2.030e+02
Flops:                1.732e+11      1.00000   1.732e+11  1.732e+11
Flops/sec:            1.797e+09      1.00000   1.797e+09  1.797e+09
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
 -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total
    Avg         %Total   counts   %Total
 0:      Main Stage: 9.6379e+01 100.0%  1.7318e+11 100.0%  0.000e+00   0.0%
 0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops
      --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                42 1.0 2.2411e-05 1.0 8.53e+03 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   380
VecTDot            77761 1.0 1.4294e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00
0.0e+00  1  3  0  0  0   1  3  0  0  0  3098
VecNorm            38894 1.0 9.1002e-01 1.0 2.22e+09 1.0 0.0e+00 0.0e+00
0.0e+00  1  1  0  0  0   1  1  0  0  0  2434
VecScale           38882 1.0 3.7314e-01 1.0 1.11e+09 1.0 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0  2967
VecCopy            38908 1.0 2.1655e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             77887 1.0 3.2034e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY            77777 1.0 1.8382e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  3  0  0  0   2  3  0  0  0  2409
VecAYPX            38875 1.0 1.2884e+00 1.0 2.21e+09 1.0 0.0e+00 0.0e+00
0.0e+00  1  1  0  0  0   1  1  0  0  0  1718
VecAssemblyBegin      68 1.0 1.9407e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd        68 1.0 2.6941e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin       48 1.0 4.6349e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult            38891 1.0 4.3045e+01 1.0 8.03e+10 1.0 0.0e+00 0.0e+00
0.0e+00 45 46  0  0  0  45 46  0  0  0  1866
MatMultAdd         38889 1.0 3.5360e+01 1.0 7.91e+10 1.0 0.0e+00 0.0e+00
0.0e+00 37 46  0  0  0  37 46  0  0  0  2236
MatSolve           77769 1.0 4.8780e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00
0.0e+00 51 46  0  0  0  51 46  0  0  0  1631
MatLUFactorNum         1 1.0 1.9575e-02 1.0 2.49e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  1274
MatCholFctrSym         1 1.0 9.4891e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCholFctrNum         1 1.0 3.7885e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatILUFactorSym        1 1.0 4.1780e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatConvert             1 1.0 3.0041e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale               2 1.0 2.7180e-05 1.0 2.53e+04 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   930
MatAssemblyBegin      32 1.0 4.0531e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        32 1.0 1.2032e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRow         114978 1.0 5.9254e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       6 1.0 1.5707e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         2 1.0 3.2425e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         6 1.0 3.0580e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                7 1.0 3.5119e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                1 1.0 1.9384e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMatMult             1 1.0 2.7120e-03 1.0 3.16e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   117
MatMatMultSym          1 1.0 1.8010e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMatMultNum          1 1.0 6.1703e-04 1.0 3.16e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   513
KSPSetUp               4 1.0 9.8944e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 9.3380e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00
0.0e+00 97100  0  0  0  97100  0  0  0  1855
PCSetUp                4 1.0 6.6326e-02 1.0 2.53e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   381
PCSetUpOnBlocks        5 1.0 2.4082e-02 1.0 2.49e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  1036
PCApply                5 1.0 9.3376e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00
0.0e+00 97100  0  0  0  97100  0  0  0  1855
KSPSolve_FS_0          5 1.0 7.0214e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve_FS_Schu       5 1.0 9.3372e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00
0.0e+00 97100  0  0  0  97100  0  0  0  1855
KSPSolve_FS_Low        5 1.0 2.1377e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    92             92      9698040     0.
      Vector Scatter    24             24        15936     0.
           Index Set    51             51       537876     0.
   IS L to G Mapping     3              3       240408     0.
              Matrix    16             16     77377776     0.
       Krylov Solver     6              6         7888     0.
      Preconditioner     6              6         6288     0.
              Viewer     1              0            0     0.
    Distributed Mesh     1              1         4624     0.
Star Forest Bipartite Graph     2              2         1616     0.
     Discrete System     1              1          872     0.
========================================================================================================================
Average time to get PetscTime(): 0.
#PETSc Option Table entries:
-ksp_monitor
-ksp_view
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-shared-libraries=1 --with-debugging=0
--download-suitesparse --download-blacs --download-ptscotch=yes
--with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl
--CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps
--download-metis
--prefix=/home/dknez/software/libmesh_install/opt_real/petsc
--download-hypre --download-ml
-----------------------------------------
Libraries compiled on Wed Sep 21 17:38:52 2016 on david-Lenovo
Machine characteristics:
Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial
Using PETSc directory: /home/dknez/software/petsc-src
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -fvisibility=hidden -g -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0
-Wno-unused-dummy-argument -g -O   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths:
-I/home/dknez/software/petsc-src/arch-linux2-c-opt/include
-I/home/dknez/software/petsc-src/include
-I/home/dknez/software/petsc-src/include
-I/home/dknez/software/petsc-src/arch-linux2-c-opt/include
-I/home/dknez/software/libmesh_install/opt_real/petsc/include
-I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent
-I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include
-I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries:
-Wl,-rpath,/home/dknez/software/petsc-src/arch-linux2-c-opt/lib
-L/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib
-L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lcmumps -ldmumps
-lsmumps -lzmumps -lmumps_common -lpord -lmetis -lHYPRE
-Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5
-L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu
-L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu
-L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lscalapack -lml -lmpi_cxx
-lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd
-lsuitesparseconfig
-Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64
-L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64
-lmkl_sequential -lmkl_core -lpthread -lm -lhwloc -lptesmumps -lptscotch
-lptscotcherr -lscotch -lscotcherr -lX11 -lm -lmpi_usempif08
-lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm
-lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lpthread -lz
-Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5
-L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu
-L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu
-L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu
-L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/usr/lib/openmpi/lib -lmpi
-lgcc_s -lpthread -ldl
-----------------------------------------

On Wed, Jan 11, 2017 at 4:49 PM, Dave May <dave.mayhem23 at gmail.com> wrote:

> It looks like the Schur solve is requiring a huge number of iterates to
> converge (based on the instances of MatMult).
> This is killing the performance.
>
> Are you sure that A11 is a good approximation to S? You might consider
> trying the selfp option
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/
> PCFieldSplitSetSchurPre.html#PCFieldSplitSetSchurPre
>
> Note that the best approx to S is likely both problem and discretisation
> dependent so if selfp is also terrible, you might want to consider coding
> up your own approx to S for your specific system.
>
>
> Thanks,
>   Dave
>
>
> On Wed, 11 Jan 2017 at 22:34, David Knezevic <david.knezevic at akselos.com>
> wrote:
>
> I have a definite block 2x2 system and I figured it'd be good to apply the
> PCFIELDSPLIT functionality with Schur complement, as described in Section
> 4.5 of the manual.
>
> The A00 block of my matrix is very small so I figured I'd specify a direct
> solver (i.e. MUMPS) for that block.
>
> So I did the following:
> - PCFieldSplitSetIS to specify the indices of the two splits
> - PCFieldSplitGetSubKSP to get the two KSP objects, and to set the solver
> and PC types for each (MUMPS for A00, ILU+CG for A11)
> - I set -pc_fieldsplit_schur_fact_type full
>
> Below I have pasted the output of "-ksp_view -ksp_monitor -log_view" for a
> test case. It seems to converge well, but I'm concerned about the speed
> (about 90 seconds, vs. about 1 second if I use a direct solver for the
> entire system). I just wanted to check if I'm setting this up in a good way?
>
> Many thanks,
> David
>
> ------------------------------------------------------------
> -----------------------
>
>   0 KSP Residual norm 5.405774214400e+04
>   1 KSP Residual norm 1.849649014371e+02
>   2 KSP Residual norm 7.462775074989e-02
>   3 KSP Residual norm 2.680497175260e-04
> KSP Object: 1 MPI processes
>   type: cg
>   maximum iterations=1000
>   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
>   left preconditioning
>   using nonzero initial guess
>   using PRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: fieldsplit
>     FieldSplit with Schur preconditioner, factorization FULL
>     Preconditioner for the Schur complement formed from A11
>     Split info:
>     Split number 0 Defined by IS
>     Split number 1 Defined by IS
>     KSP solver for A00 block
>       KSP Object:      (fieldsplit_RB_split_)       1 MPI processes
>         type: preonly
>         maximum iterations=10000, initial guess is zero
>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>         left preconditioning
>         using NONE norm type for convergence test
>       PC Object:      (fieldsplit_RB_split_)       1 MPI processes
>         type: cholesky
>           Cholesky: out-of-place factorization
>           tolerance for zero pivot 2.22045e-14
>           matrix ordering: natural
>           factor fill ratio given 0., needed 0.
>             Factored matrix follows:
>               Mat Object:               1 MPI processes
>                 type: seqaij
>                 rows=324, cols=324
>                 package used to perform factorization: mumps
>                 total: nonzeros=3042, allocated nonzeros=3042
>                 total number of mallocs used during MatSetValues calls =0
>                   MUMPS run parameters:
>                     SYM (matrix type):                   2
>                     PAR (host participation):            1
>                     ICNTL(1) (output for error):         6
>                     ICNTL(2) (output of diagnostic msg): 0
>                     ICNTL(3) (output for global info):   0
>                     ICNTL(4) (level of printing):        0
>                     ICNTL(5) (input mat struct):         0
>                     ICNTL(6) (matrix prescaling):        7
>                     ICNTL(7) (sequentia matrix ordering):7
>                     ICNTL(8) (scalling strategy):        77
>                     ICNTL(10) (max num of refinements):  0
>                     ICNTL(11) (error analysis):          0
>                     ICNTL(12) (efficiency control):
>   0
>                     ICNTL(13) (efficiency control):
>   0
>                     ICNTL(14) (percentage of estimated workspace
> increase): 20
>                     ICNTL(18) (input mat struct):
>   0
>                     ICNTL(19) (Shur complement info):
>   0
>                     ICNTL(20) (rhs sparse pattern):
>   0
>                     ICNTL(21) (solution struct):
>  0
>                     ICNTL(22) (in-core/out-of-core facility):
>   0
>                     ICNTL(23) (max size of memory can be allocated
> locally):0
>                     ICNTL(24) (detection of null pivot rows):
>   0
>                     ICNTL(25) (computation of a null space basis):
>  0
>                     ICNTL(26) (Schur options for rhs or solution):
>  0
>                     ICNTL(27) (experimental parameter):
>   -24
>                     ICNTL(28) (use parallel or sequential ordering):
>  1
>                     ICNTL(29) (parallel ordering):
>  0
>                     ICNTL(30) (user-specified set of entries in inv(A)):
>  0
>                     ICNTL(31) (factors is discarded in the solve phase):
>  0
>                     ICNTL(33) (compute determinant):
>  0
>                     CNTL(1) (relative pivoting threshold):      0.01
>                     CNTL(2) (stopping criterion of refinement):
> 1.49012e-08
>                     CNTL(3) (absolute pivoting threshold):      0.
>                     CNTL(4) (value of static pivoting):         -1.
>                     CNTL(5) (fixation for null pivots):         0.
>                     RINFO(1) (local estimated flops for the elimination
> after analysis):
>                       [0] 29394.
>                     RINFO(2) (local estimated flops for the assembly after
> factorization):
>                       [0]  1092.
>                     RINFO(3) (local estimated flops for the elimination
> after factorization):
>                       [0]  29394.
>                     INFO(15) (estimated size of (in MB) MUMPS internal
> data for running numerical factorization):
>                     [0] 1
>                     INFO(16) (size of (in MB) MUMPS internal data used
> during numerical factorization):
>                       [0] 1
>                     INFO(23) (num of pivots eliminated on this processor
> after factorization):
>                       [0] 324
>                     RINFOG(1) (global estimated flops for the elimination
> after analysis): 29394.
>                     RINFOG(2) (global estimated flops for the assembly
> after factorization): 1092.
>                     RINFOG(3) (global estimated flops for the elimination
> after factorization): 29394.
>                     (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
> (0.,0.)*(2^0)
>                     INFOG(3) (estimated real workspace for factors on all
> processors after analysis): 3888
>                     INFOG(4) (estimated integer workspace for factors on
> all processors after analysis): 2067
>                     INFOG(5) (estimated maximum front size in the complete
> tree): 12
>                     INFOG(6) (number of nodes in the complete tree): 53
>                     INFOG(7) (ordering option effectively use after
> analysis): 2
>                     INFOG(8) (structural symmetry in percent of the
> permuted matrix after analysis): 100
>                     INFOG(9) (total real/complex workspace to store the
> matrix factors after factorization): 3888
>                     INFOG(10) (total integer space store the matrix
> factors after factorization): 2067
>                     INFOG(11) (order of largest frontal matrix after
> factorization): 12
>                     INFOG(12) (number of off-diagonal pivots): 0
>                     INFOG(13) (number of delayed pivots after
> factorization): 0
>                     INFOG(14) (number of memory compress after
> factorization): 0
>                     INFOG(15) (number of steps of iterative refinement
> after solution): 0
>                     INFOG(16) (estimated size (in MB) of all MUMPS
> internal data for factorization after analysis: value on the most memory
> consuming processor): 1
>                     INFOG(17) (estimated size of all MUMPS internal data
> for factorization after analysis: sum over all processors): 1
>                     INFOG(18) (size of all MUMPS internal data allocated
> during factorization: value on the most memory consuming processor): 1
>                     INFOG(19) (size of all MUMPS internal data allocated
> during factorization: sum over all processors): 1
>                     INFOG(20) (estimated number of entries in the
> factors): 3042
>                     INFOG(21) (size in MB of memory effectively used
> during factorization - value on the most memory consuming processor): 1
>                     INFOG(22) (size in MB of memory effectively used
> during factorization - sum over all processors): 1
>                     INFOG(23) (after analysis: value of ICNTL(6)
> effectively used): 5
>                     INFOG(24) (after analysis: value of ICNTL(12)
> effectively used): 1
>                     INFOG(25) (after factorization: number of pivots
> modified by static pivoting): 0
>                     INFOG(28) (after factorization: number of null pivots
> encountered): 0
>                     INFOG(29) (after factorization: effective number of
> entries in the factors (sum over all processors)): 3042
>                     INFOG(30, 31) (after solution: size in Mbytes of
> memory used during solution phase): 0, 0
>                     INFOG(32) (after analysis: type of analysis done): 1
>                     INFOG(33) (value used for ICNTL(8)): -2
>                     INFOG(34) (exponent of the determinant if determinant
> is requested): 0
>         linear system matrix = precond matrix:
>         Mat Object:        (fieldsplit_RB_split_)         1 MPI processes
>           type: seqaij
>           rows=324, cols=324
>           total: nonzeros=5760, allocated nonzeros=5760
>           total number of mallocs used during MatSetValues calls =0
>             using I-node routines: found 108 nodes, limit used is 5
>     KSP solver for S = A11 - A10 inv(A00) A01
>       KSP Object:      (fieldsplit_FE_split_)       1 MPI processes
>         type: cg
>         maximum iterations=10000, initial guess is zero
>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>         left preconditioning
>         using PRECONDITIONED norm type for convergence test
>       PC Object:      (fieldsplit_FE_split_)       1 MPI processes
>         type: bjacobi
>           block Jacobi: number of blocks = 1
>           Local solve is same for all blocks, in the following KSP and PC
> objects:
>           KSP Object:          (fieldsplit_FE_split_sub_)           1 MPI
> processes
>             type: preonly
>             maximum iterations=10000, initial guess is zero
>             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>             left preconditioning
>             using NONE norm type for convergence test
>           PC Object:          (fieldsplit_FE_split_sub_)           1 MPI
> processes
>             type: ilu
>               ILU: out-of-place factorization
>               0 levels of fill
>               tolerance for zero pivot 2.22045e-14
>               matrix ordering: natural
>               factor fill ratio given 1., needed 1.
>                 Factored matrix follows:
>                   Mat Object:                   1 MPI processes
>                     type: seqaij
>                     rows=28476, cols=28476
>                     package used to perform factorization: petsc
>                     total: nonzeros=1017054, allocated nonzeros=1017054
>                     total number of mallocs used during MatSetValues calls
> =0
>                       using I-node routines: found 9492 nodes, limit used
> is 5
>             linear system matrix = precond matrix:
>             Mat Object:            (fieldsplit_FE_split_)             1
> MPI processes
>               type: seqaij
>               rows=28476, cols=28476
>               total: nonzeros=1017054, allocated nonzeros=1017054
>               total number of mallocs used during MatSetValues calls =0
>                 using I-node routines: found 9492 nodes, limit used is 5
>         linear system matrix followed by preconditioner matrix:
>         Mat Object:        (fieldsplit_FE_split_)         1 MPI processes
>           type: schurcomplement
>           rows=28476, cols=28476
>             Schur complement A11 - A10 inv(A00) A01
>             A11
>               Mat Object:              (fieldsplit_FE_split_)
>   1 MPI processes
>                 type: seqaij
>                 rows=28476, cols=28476
>                 total: nonzeros=1017054, allocated nonzeros=1017054
>                 total number of mallocs used during MatSetValues calls =0
>                   using I-node routines: found 9492 nodes, limit used is 5
>             A10
>               Mat Object:               1 MPI processes
>                 type: seqaij
>                 rows=28476, cols=324
>                 total: nonzeros=936, allocated nonzeros=936
>                 total number of mallocs used during MatSetValues calls =0
>                   using I-node routines: found 5717 nodes, limit used is 5
>             KSP of A00
>               KSP Object:              (fieldsplit_RB_split_)
>   1 MPI processes
>                 type: preonly
>                 maximum iterations=10000, initial guess is zero
>                 tolerances:  relative=1e-05, absolute=1e-50,
> divergence=10000.
>                 left preconditioning
>                 using NONE norm type for convergence test
>               PC Object:              (fieldsplit_RB_split_)
> 1 MPI processes
>                 type: cholesky
>                   Cholesky: out-of-place factorization
>                   tolerance for zero pivot 2.22045e-14
>                   matrix ordering: natural
>                   factor fill ratio given 0., needed 0.
>                     Factored matrix follows:
>                       Mat Object:                       1 MPI processes
>                         type: seqaij
>                         rows=324, cols=324
>                         package used to perform factorization: mumps
>                         total: nonzeros=3042, allocated nonzeros=3042
>                         total number of mallocs used during MatSetValues
> calls =0
>                           MUMPS run parameters:
>                             SYM (matrix type):                   2
>                             PAR (host participation):            1
>                             ICNTL(1) (output for error):         6
>                             ICNTL(2) (output of diagnostic msg): 0
>                             ICNTL(3) (output for global info):   0
>                             ICNTL(4) (level of printing):        0
>                             ICNTL(5) (input mat struct):         0
>                             ICNTL(6) (matrix prescaling):        7
>                             ICNTL(7) (sequentia matrix ordering):7
>                             ICNTL(8) (scalling strategy):        77
>                             ICNTL(10) (max num of refinements):  0
>                             ICNTL(11) (error analysis):          0
>                             ICNTL(12) (efficiency control):
>           0
>                             ICNTL(13) (efficiency control):
>           0
>                             ICNTL(14) (percentage of estimated workspace
> increase): 20
>                             ICNTL(18) (input mat struct):
>           0
>                             ICNTL(19) (Shur complement info):
>           0
>                             ICNTL(20) (rhs sparse pattern):
>           0
>                             ICNTL(21) (solution struct):
>          0
>                             ICNTL(22) (in-core/out-of-core facility):
>           0
>                             ICNTL(23) (max size of memory can be allocated
> locally):0
>                             ICNTL(24) (detection of null pivot rows):
>           0
>                             ICNTL(25) (computation of a null space basis):
>          0
>                             ICNTL(26) (Schur options for rhs or solution):
>          0
>                             ICNTL(27) (experimental parameter):
>           -24
>                             ICNTL(28) (use parallel or sequential
> ordering):        1
>                             ICNTL(29) (parallel ordering):
>          0
>                             ICNTL(30) (user-specified set of entries in
> inv(A)):    0
>                             ICNTL(31) (factors is discarded in the solve
> phase):    0
>                             ICNTL(33) (compute determinant):
>          0
>                             CNTL(1) (relative pivoting threshold):
>  0.01
>                             CNTL(2) (stopping criterion of refinement):
> 1.49012e-08
>                             CNTL(3) (absolute pivoting threshold):      0.
>                             CNTL(4) (value of static pivoting):
> -1.
>                             CNTL(5) (fixation for null pivots):         0.
>                             RINFO(1) (local estimated flops for the
> elimination after analysis):
>                               [0] 29394.
>                             RINFO(2) (local estimated flops for the
> assembly after factorization):
>                               [0]  1092.
>                             RINFO(3) (local estimated flops for the
> elimination after factorization):
>                               [0]  29394.
>                             INFO(15) (estimated size of (in MB) MUMPS
> internal data for running numerical factorization):
>                             [0] 1
>                             INFO(16) (size of (in MB) MUMPS internal data
> used during numerical factorization):
>                               [0] 1
>                             INFO(23) (num of pivots eliminated on this
> processor after factorization):
>                               [0] 324
>                             RINFOG(1) (global estimated flops for the
> elimination after analysis): 29394.
>                             RINFOG(2) (global estimated flops for the
> assembly after factorization): 1092.
>                             RINFOG(3) (global estimated flops for the
> elimination after factorization): 29394.
>                             (RINFOG(12) RINFOG(13))*2^INFOG(34)
> (determinant): (0.,0.)*(2^0)
>                             INFOG(3) (estimated real workspace for factors
> on all processors after analysis): 3888
>                             INFOG(4) (estimated integer workspace for
> factors on all processors after analysis): 2067
>                             INFOG(5) (estimated maximum front size in the
> complete tree): 12
>                             INFOG(6) (number of nodes in the complete
> tree): 53
>                             INFOG(7) (ordering option effectively use
> after analysis): 2
>                             INFOG(8) (structural symmetry in percent of
> the permuted matrix after analysis): 100
>                             INFOG(9) (total real/complex workspace to
> store the matrix factors after factorization): 3888
>                             INFOG(10) (total integer space store the
> matrix factors after factorization): 2067
>                             INFOG(11) (order of largest frontal matrix
> after factorization): 12
>                             INFOG(12) (number of off-diagonal pivots): 0
>                             INFOG(13) (number of delayed pivots after
> factorization): 0
>                             INFOG(14) (number of memory compress after
> factorization): 0
>                             INFOG(15) (number of steps of iterative
> refinement after solution): 0
>                             INFOG(16) (estimated size (in MB) of all MUMPS
> internal data for factorization after analysis: value on the most memory
> consuming processor): 1
>                             INFOG(17) (estimated size of all MUMPS
> internal data for factorization after analysis: sum over all processors): 1
>                             INFOG(18) (size of all MUMPS internal data
> allocated during factorization: value on the most memory consuming
> processor): 1
>                             INFOG(19) (size of all MUMPS internal data
> allocated during factorization: sum over all processors): 1
>                             INFOG(20) (estimated number of entries in the
> factors): 3042
>                             INFOG(21) (size in MB of memory effectively
> used during factorization - value on the most memory consuming processor):
> 1
>                             INFOG(22) (size in MB of memory effectively
> used during factorization - sum over all processors): 1
>                             INFOG(23) (after analysis: value of ICNTL(6)
> effectively used): 5
>                             INFOG(24) (after analysis: value of ICNTL(12)
> effectively used): 1
>                             INFOG(25) (after factorization: number of
> pivots modified by static pivoting): 0
>                             INFOG(28) (after factorization: number of null
> pivots encountered): 0
>                             INFOG(29) (after factorization: effective
> number of entries in the factors (sum over all processors)): 3042
>                             INFOG(30, 31) (after solution: size in Mbytes
> of memory used during solution phase): 0, 0
>                             INFOG(32) (after analysis: type of analysis
> done): 1
>                             INFOG(33) (value used for ICNTL(8)): -2
>                             INFOG(34) (exponent of the determinant if
> determinant is requested): 0
>                 linear system matrix = precond matrix:
>                 Mat Object:                (fieldsplit_RB_split_)
>         1 MPI processes
>                   type: seqaij
>                   rows=324, cols=324
>                   total: nonzeros=5760, allocated nonzeros=5760
>                   total number of mallocs used during MatSetValues calls =0
>                     using I-node routines: found 108 nodes, limit used is 5
>             A01
>               Mat Object:               1 MPI processes
>                 type: seqaij
>                 rows=324, cols=28476
>                 total: nonzeros=936, allocated nonzeros=936
>                 total number of mallocs used during MatSetValues calls =0
>                   using I-node routines: found 67 nodes, limit used is 5
>         Mat Object:        (fieldsplit_FE_split_)         1 MPI processes
>           type: seqaij
>           rows=28476, cols=28476
>           total: nonzeros=1017054, allocated nonzeros=1017054
>           total number of mallocs used during MatSetValues calls =0
>             using I-node routines: found 9492 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Mat Object:  ()   1 MPI processes
>     type: seqaij
>     rows=28800, cols=28800
>     total: nonzeros=1024686, allocated nonzeros=1024794
>     total number of mallocs used during MatSetValues calls =0
>       using I-node routines: found 9600 nodes, limit used is 5
>
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a
> arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Wed Jan 11
> 16:16:47 2017
> Using Petsc Release Version 3.7.3, unknown
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           9.179e+01      1.00000   9.179e+01
> Objects:              1.990e+02      1.00000   1.990e+02
> Flops:                1.634e+11      1.00000   1.634e+11  1.634e+11
> Flops/sec:            1.780e+09      1.00000   1.780e+09  1.780e+09
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00      0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 9.1787e+01 100.0%  1.6336e+11 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> ------------------------------------------------------------
> ------------------------------------------------------------
> Event                Count      Time (sec)     Flops
>       --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------
> ------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecDot                42 1.0 2.4080e-05 1.0 8.53e+03 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   354
> VecTDot            74012 1.0 1.2440e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  3  0  0  0   1  3  0  0  0  3388
> VecNorm            37020 1.0 8.3580e-01 1.0 2.11e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  1  0  0  0   1  1  0  0  0  2523
> VecScale           37008 1.0 3.5800e-01 1.0 1.05e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  2944
> VecCopy            37034 1.0 2.5754e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet             74137 1.0 3.0537e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY            74029 1.0 1.7233e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  3  0  0  0   2  3  0  0  0  2446
> VecAYPX            37001 1.0 1.2214e+00 1.0 2.11e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  1  0  0  0   1  1  0  0  0  1725
> VecAssemblyBegin      68 1.0 2.0432e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd        68 1.0 2.5988e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin       48 1.0 4.6921e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatMult            37017 1.0 4.1269e+01 1.0 7.65e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 45 47  0  0  0  45 47  0  0  0  1853
> MatMultAdd         37015 1.0 3.3638e+01 1.0 7.53e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 37 46  0  0  0  37 46  0  0  0  2238
> MatSolve           74021 1.0 4.6602e+01 1.0 7.42e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 51 45  0  0  0  51 45  0  0  0  1593
> MatLUFactorNum         1 1.0 1.7209e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1420
> MatCholFctrSym         1 1.0 8.8310e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatCholFctrNum         1 1.0 3.6907e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatILUFactorSym        1 1.0 3.7372e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin      29 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd        29 1.0 9.9473e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRow          58026 1.0 2.8155e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetSubMatrice       6 1.0 1.5399e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         2 1.0 3.0112e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries         6 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatView                7 1.0 3.4356e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               4 1.0 9.4891e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 8.8793e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 97100  0  0  0  97100  0  0  0  1840
> PCSetUp                4 1.0 3.8375e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   637
> PCSetUpOnBlocks        5 1.0 2.1250e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1150
> PCApply                5 1.0 8.8789e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 97100  0  0  0  97100  0  0  0  1840
> KSPSolve_FS_0          5 1.0 7.5364e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve_FS_Schu       5 1.0 8.8785e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 97100  0  0  0  97100  0  0  0  1840
> KSPSolve_FS_Low        5 1.0 2.1019e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ------------------------------------------------------------
> ------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Vector    91             91      9693912     0.
>       Vector Scatter    24             24        15936     0.
>            Index Set    51             51       537888     0.
>    IS L to G Mapping     3              3       240408     0.
>               Matrix    13             13     64097868     0.
>        Krylov Solver     6              6         7888     0.
>       Preconditioner     6              6         6288     0.
>               Viewer     1              0            0     0.
>     Distributed Mesh     1              1         4624     0.
> Star Forest Bipartite Graph     2              2         1616     0.
>      Discrete System     1              1          872     0.
> ============================================================
> ============================================================
> Average time to get PetscTime(): 0.
> #PETSc Option Table entries:
> -ksp_monitor
> -ksp_view
> -log_view
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --with-shared-libraries=1 --with-debugging=0
> --download-suitesparse --download-blacs --download-ptscotch=yes
> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl
> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps
> --download-metis --prefix=/home/dknez/software/libmesh_install/opt_real/petsc
> --download-hypre --download-ml
> -----------------------------------------
> Libraries compiled on Wed Sep 21 17:38:52 2016 on david-Lenovo
> Machine characteristics: Linux-4.4.0-38-generic-x86_64-
> with-Ubuntu-16.04-xenial
> Using PETSc directory: /home/dknez/software/petsc-src
> Using PETSc arch: arch-linux2-c-opt
> -----------------------------------------
>
> Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing
> -Wno-unknown-pragmas -fvisibility=hidden -g -O  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0
> -Wno-unused-dummy-argument -g -O   ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include
> -I/home/dknez/software/petsc-src/include -I/home/dknez/software/petsc-src/include
> -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include
> -I/home/dknez/software/libmesh_install/opt_real/petsc/include
> -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent
> -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include
> -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpif90
> Using libraries: -Wl,-rpath,/home/dknez/software/petsc-src/arch-linux2-c-opt/lib
> -L/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -lpetsc
> -Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib
> -L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lcmumps
> -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lmetis -lHYPRE
> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5
> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu
> -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx
> -lstdc++ -lscalapack -lml -lmpi_cxx -lstdc++ -lumfpack -lklu -lcholmod
> -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig
> -Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64
> -L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64
> -lmkl_sequential -lmkl_core -lpthread -lm -lhwloc -lptesmumps -lptscotch
> -lptscotcherr -lscotch -lscotcherr -lX11 -lm -lmpi_usempif08
> -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm
> -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lpthread -lz
> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5
> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu
> -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu
> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl
> -Wl,-rpath,/usr/lib/openmpi/lib -lmpi -lgcc_s -lpthread -ldl
> -----------------------------------------
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170111/4a338276/attachment-0001.html>