[petsc-users] Advice on improving Stokes Schur preconditioners

Sun Jun 11 23:06:14 CDT 2017

Thanks Matt, makes sense to me!

I skipped direct solvers at first because for these 'real'
configurations LU (mumps/superlu_dist) usally goes out of memory (got
32GB RAM). It would be reasonable to take one more step back and play
with synthetic examples.
I managed to run one case though with 936k dofs using: ("user" =pressure
mass matrix)

<...>
-pc_fieldsplit_schur_fact_type upper
-pc_fieldsplit_schur_precondition user
-fieldsplit_0_ksp_type preonly  
-fieldsplit_0_pc_type lu
-fieldsplit_0_pc_factor_mat_solver_package mumps

-fieldsplit_1_ksp_type gmres
-fieldsplit_1_ksp_monitor_true_residuals
-fieldsplit_1_ksp_rtol 1e-10
-fieldsplit_1_pc_type lu
-fieldsplit_1_pc_factor_mat_solver_package mumps

It takes 2 outer iterations, as expected. However the fieldsplit_1 solve
takes very long.

  0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm
4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00
    Residual norms for fieldsplit_1_ solve.
    0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm
0.000000000000e+00 ||r(i)||/||b||           -nan
  Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations 0
  1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm
4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03
    Residual norms for fieldsplit_1_ solve.
    0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm
1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00
    1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm
3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01
    2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm
2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01
    3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm
1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01
    4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm
1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01
    5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm
1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01
    6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm
1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01
    7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm
9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02
    <...>
    422 KSP preconditioned resid norm 2.984888715147e-02 true resid norm
8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11
    423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm
7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 423
  2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm
3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14
Linear solve converged due to CONVERGED_RTOL iterations 2

Does the slow convergence of the Schur block mean that my
preconditioning matrix Sp is a poor choice?

Thanks,
David

On 06/11/2017 08:53 AM, Matthew Knepley wrote:
> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte <dnolte at dim.uchile.cl
> <mailto:dnolte at dim.uchile.cl>> wrote:
>
>     Dear all,
>
>     I am solving a Stokes problem in 3D aorta geometries, using a P2/P1
>     finite elements discretization on tetrahedral meshes resulting in
>     ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and
>     the right hand side is a function of noisy measurement data.
>
>     In other settings of "standard" Stokes flow problems I have obtained
>     good convergence with an "upper" Schur complement preconditioner,
>     using
>     AMG (ML or Hypre) on the velocity block and approximating the Schur
>     complement matrix by the diagonal of the pressure mass matrix:
>
>         -ksp_converged_reason
>         -ksp_monitor_true_residual
>         -ksp_initial_guess_nonzero
>         -ksp_diagonal_scale
>         -ksp_diagonal_scale_fix
>         -ksp_type fgmres
>         -ksp_rtol 1.0e-8
>
>         -pc_type fieldsplit
>         -pc_fieldsplit_type schur
>         -pc_fieldsplit_detect_saddle_point
>         -pc_fieldsplit_schur_fact_type upper
>         -pc_fieldsplit_schur_precondition user    # <-- pressure mass
>     matrix
>
>         -fieldsplit_0_ksp_type preonly
>         -fieldsplit_0_pc_type ml
>
>         -fieldsplit_1_ksp_type preonly
>         -fieldsplit_1_pc_type jacobi
>
>
> 1) I always recommend starting from an exact solver and backing off in
> small steps for optimization. Thus
>     I would start with LU on the upper block and GMRES/LU with
> toelrance 1e-10 on the Schur block.
>     This should converge in 1 iterate.
>
> 2) I don't think you want preonly on the Schur system. You might want
> GMRES/Jacobi to invert the mass matrix.
>
> 3) You probably want to tighten the tolerance on the Schur solve, at
> least to start, and then slowly let it out. The
>     tight tolerance will show you how effective the preconditioner is
> using that Schur operator. Then you can start
>     to evaluate how effective the Schur linear sovler is.
>
> Does this make sense?
>
>   Thanks,
>
>      Matt
>  
>
>     In my present case this setup gives rather slow convergence
>     (varies for
>     different geometries between 200-500 or several thousands!). I obtain
>     better convergence with "-pc_fieldsplit_schur_precondition selfp"and
>     using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think
>     this is optimal, though).
>
>     I don't understand why the pressure mass matrix approach performs so
>     poorly and wonder what I could try to improve the convergence.
>     Until now
>     I have been using ML and Hypre BoomerAMG mostly with default
>     parameters.
>     Surely they can be improved by tuning some parameters. Which could
>     be a
>     good starting point? Are there other options I should consider?
>
>     With the above setup (jacobi) for a case that works better than
>     others,
>     the KSP terminates with
>     467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm
>     2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09
>
>     You can find the output of -ksp_view below. Let me know if you
>     need more
>     details.
>
>     Thanks in advance for your advice!
>     Best wishes
>     David
>
>
>     KSP Object: 1 MPI processes
>       type: fgmres
>         GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>     Orthogonalization with no iterative refinement
>         GMRES: happy breakdown tolerance 1e-30
>       maximum iterations=10000
>       tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
>       right preconditioning
>       diagonally scaled system
>       using nonzero initial guess
>       using UNPRECONDITIONED norm type for convergence test
>     PC Object: 1 MPI processes
>       type: fieldsplit
>         FieldSplit with Schur preconditioner, factorization UPPER
>         Preconditioner for the Schur complement formed from user
>     provided matrix
>         Split info:
>         Split number 0 Defined by IS
>         Split number 1 Defined by IS
>         KSP solver for A00 block
>           KSP Object:      (fieldsplit_0_)       1 MPI processes
>             type: preonly
>             maximum iterations=10000, initial guess is zero
>             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>             left preconditioning
>             using NONE norm type for convergence test
>           PC Object:      (fieldsplit_0_)       1 MPI processes
>             type: ml
>               MG: type is MULTIPLICATIVE, levels=5 cycles=v
>                 Cycles per PCApply=1
>                 Using Galerkin computed coarse grid matrices
>             Coarse grid solver -- level -------------------------------
>               KSP Object:          (fieldsplit_0_mg_coarse_)         
>      1 MPI
>     processes
>                 type: preonly
>                 maximum iterations=10000, initial guess is zero
>                 tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                 left preconditioning
>                 using NONE norm type for convergence test
>               PC Object:          (fieldsplit_0_mg_coarse_)         
>      1 MPI
>     processes
>                 type: lu
>                   LU: out-of-place factorization
>                   tolerance for zero pivot 2.22045e-14
>                   using diagonal shift on blocks to prevent zero pivot
>     [INBLOCKS]
>                   matrix ordering: nd
>                   factor fill ratio given 5., needed 1.
>                     Factored matrix follows:
>                       Mat Object:                   1 MPI processes
>                         type: seqaij
>                         rows=3, cols=3
>                         package used to perform factorization: petsc
>                         total: nonzeros=3, allocated nonzeros=3
>                         total number of mallocs used during MatSetValues
>     calls =0
>                           not using I-node routines
>                 linear system matrix = precond matrix:
>                 Mat Object:             1 MPI processes
>                   type: seqaij
>                   rows=3, cols=3
>                   total: nonzeros=3, allocated nonzeros=3
>                   total number of mallocs used during MatSetValues
>     calls =0
>                     not using I-node routines
>             Down solver (pre-smoother) on level 1
>     -------------------------------
>               KSP Object:          (fieldsplit_0_mg_levels_1_)           1
>     MPI processes
>                 type: richardson
>                   Richardson: damping factor=1.
>                 maximum iterations=2
>                 tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                 left preconditioning
>                 using nonzero initial guess
>                 using NONE norm type for convergence test
>               PC Object:          (fieldsplit_0_mg_levels_1_)           1
>     MPI processes
>                 type: sor
>                   SOR: type = local_symmetric, iterations = 1, local
>     iterations = 1, omega = 1.
>                 linear system matrix = precond matrix:
>                 Mat Object:             1 MPI processes
>                   type: seqaij
>                   rows=15, cols=15
>                   total: nonzeros=69, allocated nonzeros=69
>                   total number of mallocs used during MatSetValues
>     calls =0
>                     not using I-node routines
>             Up solver (post-smoother) same as down solver (pre-smoother)
>             Down solver (pre-smoother) on level 2
>     -------------------------------
>               KSP Object:          (fieldsplit_0_mg_levels_2_)           1
>     MPI processes
>                 type: richardson
>                   Richardson: damping factor=1.
>                 maximum iterations=2
>                 tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                 left preconditioning
>                 using nonzero initial guess
>                 using NONE norm type for convergence test
>               PC Object:          (fieldsplit_0_mg_levels_2_)           1
>     MPI processes
>                 type: sor
>                   SOR: type = local_symmetric, iterations = 1, local
>     iterations = 1, omega = 1.
>                 linear system matrix = precond matrix:
>                 Mat Object:             1 MPI processes
>                   type: seqaij
>                   rows=304, cols=304
>                   total: nonzeros=7354, allocated nonzeros=7354
>                   total number of mallocs used during MatSetValues
>     calls =0
>                     not using I-node routines
>             Up solver (post-smoother) same as down solver (pre-smoother)
>             Down solver (pre-smoother) on level 3
>     -------------------------------
>               KSP Object:          (fieldsplit_0_mg_levels_3_)           1
>     MPI processes
>                 type: richardson
>                   Richardson: damping factor=1.
>                 maximum iterations=2
>                 tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                 left preconditioning
>                 using nonzero initial guess
>                 using NONE norm type for convergence test
>               PC Object:          (fieldsplit_0_mg_levels_3_)           1
>     MPI processes
>                 type: sor
>                   SOR: type = local_symmetric, iterations = 1, local
>     iterations = 1, omega = 1.
>                 linear system matrix = precond matrix:
>                 Mat Object:             1 MPI processes
>                   type: seqaij
>                   rows=30236, cols=30236
>                   total: nonzeros=2730644, allocated nonzeros=2730644
>                   total number of mallocs used during MatSetValues
>     calls =0
>                     not using I-node routines
>             Up solver (post-smoother) same as down solver (pre-smoother)
>             Down solver (pre-smoother) on level 4
>     -------------------------------
>               KSP Object:          (fieldsplit_0_mg_levels_4_)           1
>     MPI processes
>                 type: richardson
>                   Richardson: damping factor=1.
>                 maximum iterations=2
>                 tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                 left preconditioning
>                 using nonzero initial guess
>                 using NONE norm type for convergence test
>               PC Object:          (fieldsplit_0_mg_levels_4_)           1
>     MPI processes
>                 type: sor
>                   SOR: type = local_symmetric, iterations = 1, local
>     iterations = 1, omega = 1.
>                 linear system matrix = precond matrix:
>                 Mat Object:            (fieldsplit_0_)             1 MPI
>     processes
>                   type: seqaij
>                   rows=894132, cols=894132
>                   total: nonzeros=70684164, allocated nonzeros=70684164
>                   total number of mallocs used during MatSetValues
>     calls =0
>                     not using I-node routines
>             Up solver (post-smoother) same as down solver (pre-smoother)
>             linear system matrix = precond matrix:
>             Mat Object:        (fieldsplit_0_)         1 MPI processes
>               type: seqaij
>               rows=894132, cols=894132
>               total: nonzeros=70684164, allocated nonzeros=70684164
>               total number of mallocs used during MatSetValues calls =0
>                 not using I-node routines
>         KSP solver for S = A11 - A10 inv(A00) A01
>           KSP Object:      (fieldsplit_1_)       1 MPI processes
>             type: preonly
>             maximum iterations=10000, initial guess is zero
>             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>             left preconditioning
>             using NONE norm type for convergence test
>           PC Object:      (fieldsplit_1_)       1 MPI processes
>             type: jacobi
>             linear system matrix followed by preconditioner matrix:
>             Mat Object:        (fieldsplit_1_)         1 MPI processes
>               type: schurcomplement
>               rows=42025, cols=42025
>                 Schur complement A11 - A10 inv(A00) A01
>                 A11
>                   Mat Object:              (fieldsplit_1_)               1
>     MPI processes
>                     type: seqaij
>                     rows=42025, cols=42025
>                     total: nonzeros=554063, allocated nonzeros=554063
>                     total number of mallocs used during MatSetValues
>     calls =0
>                       not using I-node routines
>                 A10
>                   Mat Object:               1 MPI processes
>                     type: seqaij
>                     rows=42025, cols=894132
>                     total: nonzeros=6850107, allocated nonzeros=6850107
>                     total number of mallocs used during MatSetValues
>     calls =0
>                       not using I-node routines
>                 KSP of A00
>                   KSP Object:              (fieldsplit_0_)               1
>     MPI processes
>                     type: preonly
>                     maximum iterations=10000, initial guess is zero
>                     tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                     left preconditioning
>                     using NONE norm type for convergence test
>                   PC Object:              (fieldsplit_0_)               1
>     MPI processes
>                     type: ml
>                       MG: type is MULTIPLICATIVE, levels=5 cycles=v
>                         Cycles per PCApply=1
>                         Using Galerkin computed coarse grid matrices
>                     Coarse grid solver -- level
>     -------------------------------
>                       KSP Object:
>     (fieldsplit_0_mg_coarse_)                   1 MPI processes
>                         type: preonly
>                         maximum iterations=10000, initial guess is zero
>                         tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                         left preconditioning
>                         using NONE norm type for convergence test
>                       PC Object:
>     (fieldsplit_0_mg_coarse_)                   1 MPI processes
>                         type: lu
>                           LU: out-of-place factorization
>                           tolerance for zero pivot 2.22045e-14
>                           using diagonal shift on blocks to prevent zero
>     pivot [INBLOCKS]
>                           matrix ordering: nd
>                           factor fill ratio given 5., needed 1.
>                             Factored matrix follows:
>                               Mat Object:                           1 MPI
>     processes
>                                 type: seqaij
>                                 rows=3, cols=3
>                                 package used to perform factorization:
>     petsc
>                                 total: nonzeros=3, allocated nonzeros=3
>                                 total number of mallocs used during
>     MatSetValues calls =0
>                                   not using I-node routines
>                         linear system matrix = precond matrix:
>                         Mat Object:                     1 MPI processes
>                           type: seqaij
>                           rows=3, cols=3
>                           total: nonzeros=3, allocated nonzeros=3
>                           total number of mallocs used during MatSetValues
>     calls =0
>                             not using I-node routines
>                     Down solver (pre-smoother) on level 1
>     -------------------------------
>                       KSP Object:
>     (fieldsplit_0_mg_levels_1_)                   1 MPI processes
>                         type: richardson
>                           Richardson: damping factor=1.
>                         maximum iterations=2
>                         tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                         left preconditioning
>                         using nonzero initial guess
>                         using NONE norm type for convergence test
>                       PC Object:
>     (fieldsplit_0_mg_levels_1_)                   1 MPI processes
>                         type: sor
>                           SOR: type = local_symmetric, iterations = 1,
>     local
>     iterations = 1, omega = 1.
>                         linear system matrix = precond matrix:
>                         Mat Object:                     1 MPI processes
>                           type: seqaij
>                           rows=15, cols=15
>                           total: nonzeros=69, allocated nonzeros=69
>                           total number of mallocs used during MatSetValues
>     calls =0
>                             not using I-node routines
>                     Up solver (post-smoother) same as down solver
>     (pre-smoother)
>                     Down solver (pre-smoother) on level 2
>     -------------------------------
>                       KSP Object:
>     (fieldsplit_0_mg_levels_2_)                   1 MPI processes
>                         type: richardson
>                           Richardson: damping factor=1.
>                         maximum iterations=2
>                         tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                         left preconditioning
>                         using nonzero initial guess
>                         using NONE norm type for convergence test
>                       PC Object:
>     (fieldsplit_0_mg_levels_2_)                   1 MPI processes
>                         type: sor
>                           SOR: type = local_symmetric, iterations = 1,
>     local
>     iterations = 1, omega = 1.
>                         linear system matrix = precond matrix:
>                         Mat Object:                     1 MPI processes
>                           type: seqaij
>                           rows=304, cols=304
>                           total: nonzeros=7354, allocated nonzeros=7354
>                           total number of mallocs used during MatSetValues
>     calls =0
>                             not using I-node routines
>                     Up solver (post-smoother) same as down solver
>     (pre-smoother)
>                     Down solver (pre-smoother) on level 3
>     -------------------------------
>                       KSP Object:
>     (fieldsplit_0_mg_levels_3_)                   1 MPI processes
>                         type: richardson
>                           Richardson: damping factor=1.
>                         maximum iterations=2
>                         tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                         left preconditioning
>                         using nonzero initial guess
>                         using NONE norm type for convergence test
>                       PC Object:
>     (fieldsplit_0_mg_levels_3_)                   1 MPI processes
>                         type: sor
>                           SOR: type = local_symmetric, iterations = 1,
>     local
>     iterations = 1, omega = 1.
>                         linear system matrix = precond matrix:
>                         Mat Object:                     1 MPI processes
>                           type: seqaij
>                           rows=30236, cols=30236
>                           total: nonzeros=2730644, allocated
>     nonzeros=2730644
>                           total number of mallocs used during MatSetValues
>     calls =0
>                             not using I-node routines
>                     Up solver (post-smoother) same as down solver
>     (pre-smoother)
>                     Down solver (pre-smoother) on level 4
>     -------------------------------
>                       KSP Object:
>     (fieldsplit_0_mg_levels_4_)                   1 MPI processes
>                         type: richardson
>                           Richardson: damping factor=1.
>                         maximum iterations=2
>                         tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000.
>                         left preconditioning
>                         using nonzero initial guess
>                         using NONE norm type for convergence test
>                       PC Object:
>     (fieldsplit_0_mg_levels_4_)                   1 MPI processes
>                         type: sor
>                           SOR: type = local_symmetric, iterations = 1,
>     local
>     iterations = 1, omega = 1.
>                         linear system matrix = precond matrix:
>                         Mat Object:
>     (fieldsplit_0_)                     1 MPI processes
>                           type: seqaij
>                           rows=894132, cols=894132
>                           total: nonzeros=70684164, allocated
>     nonzeros=70684164
>                           total number of mallocs used during MatSetValues
>     calls =0
>                             not using I-node routines
>                     Up solver (post-smoother) same as down solver
>     (pre-smoother)
>                     linear system matrix = precond matrix:
>                     Mat Object:
>     (fieldsplit_0_)                 1 MPI processes
>                       type: seqaij
>                       rows=894132, cols=894132
>                       total: nonzeros=70684164, allocated
>     nonzeros=70684164
>                       total number of mallocs used during MatSetValues
>     calls =0
>                         not using I-node routines
>                 A01
>                   Mat Object:               1 MPI processes
>                     type: seqaij
>                     rows=894132, cols=42025
>                     total: nonzeros=6850107, allocated nonzeros=6850107
>                     total number of mallocs used during MatSetValues
>     calls =0
>                       not using I-node routines
>             Mat Object:         1 MPI processes
>               type: seqaij
>               rows=42025, cols=42025
>               total: nonzeros=554063, allocated nonzeros=554063
>               total number of mallocs used during MatSetValues calls =0
>                 not using I-node routines
>       linear system matrix = precond matrix:
>       Mat Object:   1 MPI processes
>         type: seqaij
>         rows=936157, cols=936157
>         total: nonzeros=84938441, allocated nonzeros=84938441
>         total number of mallocs used during MatSetValues calls =0
>           not using I-node routines
>
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> http://www.caam.rice.edu/~mk51/ <http://www.caam.rice.edu/%7Emk51/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170612/6ccecf88/attachment-0001.html>