[petsc-users] Advice on improving Stokes Schur preconditioners

Sat Jun 10 20:25:35 CDT 2017

Dear all,

I am solving a Stokes problem in 3D aorta geometries, using a P2/P1
finite elements discretization on tetrahedral meshes resulting in
~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and
the right hand side is a function of noisy measurement data.

In other settings of "standard" Stokes flow problems I have obtained
good convergence with an "upper" Schur complement preconditioner, using
AMG (ML or Hypre) on the velocity block and approximating the Schur
complement matrix by the diagonal of the pressure mass matrix:

    -ksp_converged_reason
    -ksp_monitor_true_residual
    -ksp_initial_guess_nonzero
    -ksp_diagonal_scale
    -ksp_diagonal_scale_fix
    -ksp_type fgmres
    -ksp_rtol 1.0e-8

    -pc_type fieldsplit
    -pc_fieldsplit_type schur
    -pc_fieldsplit_detect_saddle_point
    -pc_fieldsplit_schur_fact_type upper
    -pc_fieldsplit_schur_precondition user    # <-- pressure mass matrix

    -fieldsplit_0_ksp_type preonly
    -fieldsplit_0_pc_type ml

    -fieldsplit_1_ksp_type preonly
    -fieldsplit_1_pc_type jacobi

In my present case this setup gives rather slow convergence (varies for
different geometries between 200-500 or several thousands!). I obtain
better convergence with "-pc_fieldsplit_schur_precondition selfp"and
using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think
this is optimal, though).

I don't understand why the pressure mass matrix approach performs so
poorly and wonder what I could try to improve the convergence. Until now
I have been using ML and Hypre BoomerAMG mostly with default parameters.
Surely they can be improved by tuning some parameters. Which could be a
good starting point? Are there other options I should consider?

With the above setup (jacobi) for a case that works better than others,
the KSP terminates with
467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm
2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09

You can find the output of -ksp_view below. Let me know if you need more
details.

Thanks in advance for your advice!
Best wishes
David

KSP Object: 1 MPI processes
  type: fgmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=10000
  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
  right preconditioning
  diagonally scaled system
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: fieldsplit
    FieldSplit with Schur preconditioner, factorization UPPER
    Preconditioner for the Schur complement formed from user provided matrix
    Split info:
    Split number 0 Defined by IS
    Split number 1 Defined by IS
    KSP solver for A00 block
      KSP Object:      (fieldsplit_0_)       1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (fieldsplit_0_)       1 MPI processes
        type: ml
          MG: type is MULTIPLICATIVE, levels=5 cycles=v
            Cycles per PCApply=1
            Using Galerkin computed coarse grid matrices
        Coarse grid solver -- level -------------------------------
          KSP Object:          (fieldsplit_0_mg_coarse_)           1 MPI
processes
            type: preonly
            maximum iterations=10000, initial guess is zero
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
            left preconditioning
            using NONE norm type for convergence test
          PC Object:          (fieldsplit_0_mg_coarse_)           1 MPI
processes
            type: lu
              LU: out-of-place factorization
              tolerance for zero pivot 2.22045e-14
              using diagonal shift on blocks to prevent zero pivot
[INBLOCKS]
              matrix ordering: nd
              factor fill ratio given 5., needed 1.
                Factored matrix follows:
                  Mat Object:                   1 MPI processes
                    type: seqaij
                    rows=3, cols=3
                    package used to perform factorization: petsc
                    total: nonzeros=3, allocated nonzeros=3
                    total number of mallocs used during MatSetValues
calls =0
                      not using I-node routines
            linear system matrix = precond matrix:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=3, cols=3
              total: nonzeros=3, allocated nonzeros=3
              total number of mallocs used during MatSetValues calls =0
                not using I-node routines
        Down solver (pre-smoother) on level 1
-------------------------------
          KSP Object:          (fieldsplit_0_mg_levels_1_)           1
MPI processes
            type: richardson
              Richardson: damping factor=1.
            maximum iterations=2
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
            left preconditioning
            using nonzero initial guess
            using NONE norm type for convergence test
          PC Object:          (fieldsplit_0_mg_levels_1_)           1
MPI processes
            type: sor
              SOR: type = local_symmetric, iterations = 1, local
iterations = 1, omega = 1.
            linear system matrix = precond matrix:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=15, cols=15
              total: nonzeros=69, allocated nonzeros=69
              total number of mallocs used during MatSetValues calls =0
                not using I-node routines
        Up solver (post-smoother) same as down solver (pre-smoother)
        Down solver (pre-smoother) on level 2
-------------------------------
          KSP Object:          (fieldsplit_0_mg_levels_2_)           1
MPI processes
            type: richardson
              Richardson: damping factor=1.
            maximum iterations=2
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
            left preconditioning
            using nonzero initial guess
            using NONE norm type for convergence test
          PC Object:          (fieldsplit_0_mg_levels_2_)           1
MPI processes
            type: sor
              SOR: type = local_symmetric, iterations = 1, local
iterations = 1, omega = 1.
            linear system matrix = precond matrix:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=304, cols=304
              total: nonzeros=7354, allocated nonzeros=7354
              total number of mallocs used during MatSetValues calls =0
                not using I-node routines
        Up solver (post-smoother) same as down solver (pre-smoother)
        Down solver (pre-smoother) on level 3
-------------------------------
          KSP Object:          (fieldsplit_0_mg_levels_3_)           1
MPI processes
            type: richardson
              Richardson: damping factor=1.
            maximum iterations=2
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
            left preconditioning
            using nonzero initial guess
            using NONE norm type for convergence test
          PC Object:          (fieldsplit_0_mg_levels_3_)           1
MPI processes
            type: sor
              SOR: type = local_symmetric, iterations = 1, local
iterations = 1, omega = 1.
            linear system matrix = precond matrix:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=30236, cols=30236
              total: nonzeros=2730644, allocated nonzeros=2730644
              total number of mallocs used during MatSetValues calls =0
                not using I-node routines
        Up solver (post-smoother) same as down solver (pre-smoother)
        Down solver (pre-smoother) on level 4
-------------------------------
          KSP Object:          (fieldsplit_0_mg_levels_4_)           1
MPI processes
            type: richardson
              Richardson: damping factor=1.
            maximum iterations=2
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
            left preconditioning
            using nonzero initial guess
            using NONE norm type for convergence test
          PC Object:          (fieldsplit_0_mg_levels_4_)           1
MPI processes
            type: sor
              SOR: type = local_symmetric, iterations = 1, local
iterations = 1, omega = 1.
            linear system matrix = precond matrix:
            Mat Object:            (fieldsplit_0_)             1 MPI
processes
              type: seqaij
              rows=894132, cols=894132
              total: nonzeros=70684164, allocated nonzeros=70684164
              total number of mallocs used during MatSetValues calls =0
                not using I-node routines
        Up solver (post-smoother) same as down solver (pre-smoother)
        linear system matrix = precond matrix:
        Mat Object:        (fieldsplit_0_)         1 MPI processes
          type: seqaij
          rows=894132, cols=894132
          total: nonzeros=70684164, allocated nonzeros=70684164
          total number of mallocs used during MatSetValues calls =0
            not using I-node routines
    KSP solver for S = A11 - A10 inv(A00) A01
      KSP Object:      (fieldsplit_1_)       1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (fieldsplit_1_)       1 MPI processes
        type: jacobi
        linear system matrix followed by preconditioner matrix:
        Mat Object:        (fieldsplit_1_)         1 MPI processes
          type: schurcomplement
          rows=42025, cols=42025
            Schur complement A11 - A10 inv(A00) A01
            A11
              Mat Object:              (fieldsplit_1_)               1
MPI processes
                type: seqaij
                rows=42025, cols=42025
                total: nonzeros=554063, allocated nonzeros=554063
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
            A10
              Mat Object:               1 MPI processes
                type: seqaij
                rows=42025, cols=894132
                total: nonzeros=6850107, allocated nonzeros=6850107
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
            KSP of A00
              KSP Object:              (fieldsplit_0_)               1
MPI processes
                type: preonly
                maximum iterations=10000, initial guess is zero
                tolerances:  relative=1e-05, absolute=1e-50,
divergence=10000.
                left preconditioning
                using NONE norm type for convergence test
              PC Object:              (fieldsplit_0_)               1
MPI processes
                type: ml
                  MG: type is MULTIPLICATIVE, levels=5 cycles=v
                    Cycles per PCApply=1
                    Using Galerkin computed coarse grid matrices
                Coarse grid solver -- level -------------------------------
                  KSP Object:                 
(fieldsplit_0_mg_coarse_)                   1 MPI processes
                    type: preonly
                    maximum iterations=10000, initial guess is zero
                    tolerances:  relative=1e-05, absolute=1e-50,
divergence=10000.
                    left preconditioning
                    using NONE norm type for convergence test
                  PC Object:                 
(fieldsplit_0_mg_coarse_)                   1 MPI processes
                    type: lu
                      LU: out-of-place factorization
                      tolerance for zero pivot 2.22045e-14
                      using diagonal shift on blocks to prevent zero
pivot [INBLOCKS]
                      matrix ordering: nd
                      factor fill ratio given 5., needed 1.
                        Factored matrix follows:
                          Mat Object:                           1 MPI
processes
                            type: seqaij
                            rows=3, cols=3
                            package used to perform factorization: petsc
                            total: nonzeros=3, allocated nonzeros=3
                            total number of mallocs used during
MatSetValues calls =0
                              not using I-node routines
                    linear system matrix = precond matrix:
                    Mat Object:                     1 MPI processes
                      type: seqaij
                      rows=3, cols=3
                      total: nonzeros=3, allocated nonzeros=3
                      total number of mallocs used during MatSetValues
calls =0
                        not using I-node routines
                Down solver (pre-smoother) on level 1
-------------------------------
                  KSP Object:                 
(fieldsplit_0_mg_levels_1_)                   1 MPI processes
                    type: richardson
                      Richardson: damping factor=1.
                    maximum iterations=2
                    tolerances:  relative=1e-05, absolute=1e-50,
divergence=10000.
                    left preconditioning
                    using nonzero initial guess
                    using NONE norm type for convergence test
                  PC Object:                 
(fieldsplit_0_mg_levels_1_)                   1 MPI processes
                    type: sor
                      SOR: type = local_symmetric, iterations = 1, local
iterations = 1, omega = 1.
                    linear system matrix = precond matrix:
                    Mat Object:                     1 MPI processes
                      type: seqaij
                      rows=15, cols=15
                      total: nonzeros=69, allocated nonzeros=69
                      total number of mallocs used during MatSetValues
calls =0
                        not using I-node routines
                Up solver (post-smoother) same as down solver (pre-smoother)
                Down solver (pre-smoother) on level 2
-------------------------------
                  KSP Object:                 
(fieldsplit_0_mg_levels_2_)                   1 MPI processes
                    type: richardson
                      Richardson: damping factor=1.
                    maximum iterations=2
                    tolerances:  relative=1e-05, absolute=1e-50,
divergence=10000.
                    left preconditioning
                    using nonzero initial guess
                    using NONE norm type for convergence test
                  PC Object:                 
(fieldsplit_0_mg_levels_2_)                   1 MPI processes
                    type: sor
                      SOR: type = local_symmetric, iterations = 1, local
iterations = 1, omega = 1.
                    linear system matrix = precond matrix:
                    Mat Object:                     1 MPI processes
                      type: seqaij
                      rows=304, cols=304
                      total: nonzeros=7354, allocated nonzeros=7354
                      total number of mallocs used during MatSetValues
calls =0
                        not using I-node routines
                Up solver (post-smoother) same as down solver (pre-smoother)
                Down solver (pre-smoother) on level 3
-------------------------------
                  KSP Object:                 
(fieldsplit_0_mg_levels_3_)                   1 MPI processes
                    type: richardson
                      Richardson: damping factor=1.
                    maximum iterations=2
                    tolerances:  relative=1e-05, absolute=1e-50,
divergence=10000.
                    left preconditioning
                    using nonzero initial guess
                    using NONE norm type for convergence test
                  PC Object:                 
(fieldsplit_0_mg_levels_3_)                   1 MPI processes
                    type: sor
                      SOR: type = local_symmetric, iterations = 1, local
iterations = 1, omega = 1.
                    linear system matrix = precond matrix:
                    Mat Object:                     1 MPI processes
                      type: seqaij
                      rows=30236, cols=30236
                      total: nonzeros=2730644, allocated nonzeros=2730644
                      total number of mallocs used during MatSetValues
calls =0
                        not using I-node routines
                Up solver (post-smoother) same as down solver (pre-smoother)
                Down solver (pre-smoother) on level 4
-------------------------------
                  KSP Object:                 
(fieldsplit_0_mg_levels_4_)                   1 MPI processes
                    type: richardson
                      Richardson: damping factor=1.
                    maximum iterations=2
                    tolerances:  relative=1e-05, absolute=1e-50,
divergence=10000.
                    left preconditioning
                    using nonzero initial guess
                    using NONE norm type for convergence test
                  PC Object:                 
(fieldsplit_0_mg_levels_4_)                   1 MPI processes
                    type: sor
                      SOR: type = local_symmetric, iterations = 1, local
iterations = 1, omega = 1.
                    linear system matrix = precond matrix:
                    Mat Object:                   
(fieldsplit_0_)                     1 MPI processes
                      type: seqaij
                      rows=894132, cols=894132
                      total: nonzeros=70684164, allocated nonzeros=70684164
                      total number of mallocs used during MatSetValues
calls =0
                        not using I-node routines
                Up solver (post-smoother) same as down solver (pre-smoother)
                linear system matrix = precond matrix:
                Mat Object:               
(fieldsplit_0_)                 1 MPI processes
                  type: seqaij
                  rows=894132, cols=894132
                  total: nonzeros=70684164, allocated nonzeros=70684164
                  total number of mallocs used during MatSetValues calls =0
                    not using I-node routines
            A01
              Mat Object:               1 MPI processes
                type: seqaij
                rows=894132, cols=42025
                total: nonzeros=6850107, allocated nonzeros=6850107
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
        Mat Object:         1 MPI processes
          type: seqaij
          rows=42025, cols=42025
          total: nonzeros=554063, allocated nonzeros=554063
          total number of mallocs used during MatSetValues calls =0
            not using I-node routines
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=936157, cols=936157
    total: nonzeros=84938441, allocated nonzeros=84938441
    total number of mallocs used during MatSetValues calls =0
      not using I-node routines