[petsc-users] Advice on improving Stokes Schur preconditioners
David Nolte
dnolte at dim.uchile.cl
Sun Jun 11 23:06:14 CDT 2017
Thanks Matt, makes sense to me!
I skipped direct solvers at first because for these 'real'
configurations LU (mumps/superlu_dist) usally goes out of memory (got
32GB RAM). It would be reasonable to take one more step back and play
with synthetic examples.
I managed to run one case though with 936k dofs using: ("user" =pressure
mass matrix)
<...>
-pc_fieldsplit_schur_fact_type upper
-pc_fieldsplit_schur_precondition user
-fieldsplit_0_ksp_type preonly
-fieldsplit_0_pc_type lu
-fieldsplit_0_pc_factor_mat_solver_package mumps
-fieldsplit_1_ksp_type gmres
-fieldsplit_1_ksp_monitor_true_residuals
-fieldsplit_1_ksp_rtol 1e-10
-fieldsplit_1_pc_type lu
-fieldsplit_1_pc_factor_mat_solver_package mumps
It takes 2 outer iterations, as expected. However the fieldsplit_1 solve
takes very long.
0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm
4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00
Residual norms for fieldsplit_1_ solve.
0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm
0.000000000000e+00 ||r(i)||/||b|| -nan
Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations 0
1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm
4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03
Residual norms for fieldsplit_1_ solve.
0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm
1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm
3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01
2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm
2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01
3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm
1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01
4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm
1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01
5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm
1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01
6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm
1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01
7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm
9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02
<...>
422 KSP preconditioned resid norm 2.984888715147e-02 true resid norm
8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11
423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm
7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 423
2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm
3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14
Linear solve converged due to CONVERGED_RTOL iterations 2
Does the slow convergence of the Schur block mean that my
preconditioning matrix Sp is a poor choice?
Thanks,
David
On 06/11/2017 08:53 AM, Matthew Knepley wrote:
> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte <dnolte at dim.uchile.cl
> <mailto:dnolte at dim.uchile.cl>> wrote:
>
> Dear all,
>
> I am solving a Stokes problem in 3D aorta geometries, using a P2/P1
> finite elements discretization on tetrahedral meshes resulting in
> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and
> the right hand side is a function of noisy measurement data.
>
> In other settings of "standard" Stokes flow problems I have obtained
> good convergence with an "upper" Schur complement preconditioner,
> using
> AMG (ML or Hypre) on the velocity block and approximating the Schur
> complement matrix by the diagonal of the pressure mass matrix:
>
> -ksp_converged_reason
> -ksp_monitor_true_residual
> -ksp_initial_guess_nonzero
> -ksp_diagonal_scale
> -ksp_diagonal_scale_fix
> -ksp_type fgmres
> -ksp_rtol 1.0e-8
>
> -pc_type fieldsplit
> -pc_fieldsplit_type schur
> -pc_fieldsplit_detect_saddle_point
> -pc_fieldsplit_schur_fact_type upper
> -pc_fieldsplit_schur_precondition user # <-- pressure mass
> matrix
>
> -fieldsplit_0_ksp_type preonly
> -fieldsplit_0_pc_type ml
>
> -fieldsplit_1_ksp_type preonly
> -fieldsplit_1_pc_type jacobi
>
>
> 1) I always recommend starting from an exact solver and backing off in
> small steps for optimization. Thus
> I would start with LU on the upper block and GMRES/LU with
> toelrance 1e-10 on the Schur block.
> This should converge in 1 iterate.
>
> 2) I don't think you want preonly on the Schur system. You might want
> GMRES/Jacobi to invert the mass matrix.
>
> 3) You probably want to tighten the tolerance on the Schur solve, at
> least to start, and then slowly let it out. The
> tight tolerance will show you how effective the preconditioner is
> using that Schur operator. Then you can start
> to evaluate how effective the Schur linear sovler is.
>
> Does this make sense?
>
> Thanks,
>
> Matt
>
>
> In my present case this setup gives rather slow convergence
> (varies for
> different geometries between 200-500 or several thousands!). I obtain
> better convergence with "-pc_fieldsplit_schur_precondition selfp"and
> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think
> this is optimal, though).
>
> I don't understand why the pressure mass matrix approach performs so
> poorly and wonder what I could try to improve the convergence.
> Until now
> I have been using ML and Hypre BoomerAMG mostly with default
> parameters.
> Surely they can be improved by tuning some parameters. Which could
> be a
> good starting point? Are there other options I should consider?
>
> With the above setup (jacobi) for a case that works better than
> others,
> the KSP terminates with
> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm
> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09
>
> You can find the output of -ksp_view below. Let me know if you
> need more
> details.
>
> Thanks in advance for your advice!
> Best wishes
> David
>
>
> KSP Object: 1 MPI processes
> type: fgmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=10000
> tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
> right preconditioning
> diagonally scaled system
> using nonzero initial guess
> using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
> type: fieldsplit
> FieldSplit with Schur preconditioner, factorization UPPER
> Preconditioner for the Schur complement formed from user
> provided matrix
> Split info:
> Split number 0 Defined by IS
> Split number 1 Defined by IS
> KSP solver for A00 block
> KSP Object: (fieldsplit_0_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (fieldsplit_0_) 1 MPI processes
> type: ml
> MG: type is MULTIPLICATIVE, levels=5 cycles=v
> Cycles per PCApply=1
> Using Galerkin computed coarse grid matrices
> Coarse grid solver -- level -------------------------------
> KSP Object: (fieldsplit_0_mg_coarse_)
> 1 MPI
> processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (fieldsplit_0_mg_coarse_)
> 1 MPI
> processes
> type: lu
> LU: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> using diagonal shift on blocks to prevent zero pivot
> [INBLOCKS]
> matrix ordering: nd
> factor fill ratio given 5., needed 1.
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=3, cols=3
> package used to perform factorization: petsc
> total: nonzeros=3, allocated nonzeros=3
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=3, cols=3
> total: nonzeros=3, allocated nonzeros=3
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Down solver (pre-smoother) on level 1
> -------------------------------
> KSP Object: (fieldsplit_0_mg_levels_1_) 1
> MPI processes
> type: richardson
> Richardson: damping factor=1.
> maximum iterations=2
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using nonzero initial guess
> using NONE norm type for convergence test
> PC Object: (fieldsplit_0_mg_levels_1_) 1
> MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1, local
> iterations = 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=15, cols=15
> total: nonzeros=69, allocated nonzeros=69
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Up solver (post-smoother) same as down solver (pre-smoother)
> Down solver (pre-smoother) on level 2
> -------------------------------
> KSP Object: (fieldsplit_0_mg_levels_2_) 1
> MPI processes
> type: richardson
> Richardson: damping factor=1.
> maximum iterations=2
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using nonzero initial guess
> using NONE norm type for convergence test
> PC Object: (fieldsplit_0_mg_levels_2_) 1
> MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1, local
> iterations = 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=304, cols=304
> total: nonzeros=7354, allocated nonzeros=7354
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Up solver (post-smoother) same as down solver (pre-smoother)
> Down solver (pre-smoother) on level 3
> -------------------------------
> KSP Object: (fieldsplit_0_mg_levels_3_) 1
> MPI processes
> type: richardson
> Richardson: damping factor=1.
> maximum iterations=2
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using nonzero initial guess
> using NONE norm type for convergence test
> PC Object: (fieldsplit_0_mg_levels_3_) 1
> MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1, local
> iterations = 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=30236, cols=30236
> total: nonzeros=2730644, allocated nonzeros=2730644
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Up solver (post-smoother) same as down solver (pre-smoother)
> Down solver (pre-smoother) on level 4
> -------------------------------
> KSP Object: (fieldsplit_0_mg_levels_4_) 1
> MPI processes
> type: richardson
> Richardson: damping factor=1.
> maximum iterations=2
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using nonzero initial guess
> using NONE norm type for convergence test
> PC Object: (fieldsplit_0_mg_levels_4_) 1
> MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1, local
> iterations = 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object: (fieldsplit_0_) 1 MPI
> processes
> type: seqaij
> rows=894132, cols=894132
> total: nonzeros=70684164, allocated nonzeros=70684164
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Up solver (post-smoother) same as down solver (pre-smoother)
> linear system matrix = precond matrix:
> Mat Object: (fieldsplit_0_) 1 MPI processes
> type: seqaij
> rows=894132, cols=894132
> total: nonzeros=70684164, allocated nonzeros=70684164
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> KSP solver for S = A11 - A10 inv(A00) A01
> KSP Object: (fieldsplit_1_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (fieldsplit_1_) 1 MPI processes
> type: jacobi
> linear system matrix followed by preconditioner matrix:
> Mat Object: (fieldsplit_1_) 1 MPI processes
> type: schurcomplement
> rows=42025, cols=42025
> Schur complement A11 - A10 inv(A00) A01
> A11
> Mat Object: (fieldsplit_1_) 1
> MPI processes
> type: seqaij
> rows=42025, cols=42025
> total: nonzeros=554063, allocated nonzeros=554063
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> A10
> Mat Object: 1 MPI processes
> type: seqaij
> rows=42025, cols=894132
> total: nonzeros=6850107, allocated nonzeros=6850107
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> KSP of A00
> KSP Object: (fieldsplit_0_) 1
> MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (fieldsplit_0_) 1
> MPI processes
> type: ml
> MG: type is MULTIPLICATIVE, levels=5 cycles=v
> Cycles per PCApply=1
> Using Galerkin computed coarse grid matrices
> Coarse grid solver -- level
> -------------------------------
> KSP Object:
> (fieldsplit_0_mg_coarse_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object:
> (fieldsplit_0_mg_coarse_) 1 MPI processes
> type: lu
> LU: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> using diagonal shift on blocks to prevent zero
> pivot [INBLOCKS]
> matrix ordering: nd
> factor fill ratio given 5., needed 1.
> Factored matrix follows:
> Mat Object: 1 MPI
> processes
> type: seqaij
> rows=3, cols=3
> package used to perform factorization:
> petsc
> total: nonzeros=3, allocated nonzeros=3
> total number of mallocs used during
> MatSetValues calls =0
> not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=3, cols=3
> total: nonzeros=3, allocated nonzeros=3
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Down solver (pre-smoother) on level 1
> -------------------------------
> KSP Object:
> (fieldsplit_0_mg_levels_1_) 1 MPI processes
> type: richardson
> Richardson: damping factor=1.
> maximum iterations=2
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using nonzero initial guess
> using NONE norm type for convergence test
> PC Object:
> (fieldsplit_0_mg_levels_1_) 1 MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1,
> local
> iterations = 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=15, cols=15
> total: nonzeros=69, allocated nonzeros=69
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Up solver (post-smoother) same as down solver
> (pre-smoother)
> Down solver (pre-smoother) on level 2
> -------------------------------
> KSP Object:
> (fieldsplit_0_mg_levels_2_) 1 MPI processes
> type: richardson
> Richardson: damping factor=1.
> maximum iterations=2
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using nonzero initial guess
> using NONE norm type for convergence test
> PC Object:
> (fieldsplit_0_mg_levels_2_) 1 MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1,
> local
> iterations = 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=304, cols=304
> total: nonzeros=7354, allocated nonzeros=7354
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Up solver (post-smoother) same as down solver
> (pre-smoother)
> Down solver (pre-smoother) on level 3
> -------------------------------
> KSP Object:
> (fieldsplit_0_mg_levels_3_) 1 MPI processes
> type: richardson
> Richardson: damping factor=1.
> maximum iterations=2
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using nonzero initial guess
> using NONE norm type for convergence test
> PC Object:
> (fieldsplit_0_mg_levels_3_) 1 MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1,
> local
> iterations = 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=30236, cols=30236
> total: nonzeros=2730644, allocated
> nonzeros=2730644
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Up solver (post-smoother) same as down solver
> (pre-smoother)
> Down solver (pre-smoother) on level 4
> -------------------------------
> KSP Object:
> (fieldsplit_0_mg_levels_4_) 1 MPI processes
> type: richardson
> Richardson: damping factor=1.
> maximum iterations=2
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> left preconditioning
> using nonzero initial guess
> using NONE norm type for convergence test
> PC Object:
> (fieldsplit_0_mg_levels_4_) 1 MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1,
> local
> iterations = 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object:
> (fieldsplit_0_) 1 MPI processes
> type: seqaij
> rows=894132, cols=894132
> total: nonzeros=70684164, allocated
> nonzeros=70684164
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Up solver (post-smoother) same as down solver
> (pre-smoother)
> linear system matrix = precond matrix:
> Mat Object:
> (fieldsplit_0_) 1 MPI processes
> type: seqaij
> rows=894132, cols=894132
> total: nonzeros=70684164, allocated
> nonzeros=70684164
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> A01
> Mat Object: 1 MPI processes
> type: seqaij
> rows=894132, cols=42025
> total: nonzeros=6850107, allocated nonzeros=6850107
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node routines
> Mat Object: 1 MPI processes
> type: seqaij
> rows=42025, cols=42025
> total: nonzeros=554063, allocated nonzeros=554063
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=936157, cols=936157
> total: nonzeros=84938441, allocated nonzeros=84938441
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> http://www.caam.rice.edu/~mk51/ <http://www.caam.rice.edu/%7Emk51/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170612/6ccecf88/attachment-0001.html>
More information about the petsc-users
mailing list