[petsc-users] Advice on improving Stokes Schur preconditioners
Matthew Knepley
knepley at gmail.com
Mon Jun 12 06:50:13 CDT 2017
On Sun, Jun 11, 2017 at 11:06 PM, David Nolte <dnolte at dim.uchile.cl> wrote:
> Thanks Matt, makes sense to me!
>
> I skipped direct solvers at first because for these 'real' configurations
> LU (mumps/superlu_dist) usally goes out of memory (got 32GB RAM). It would
> be reasonable to take one more step back and play with synthetic examples.
> I managed to run one case though with 936k dofs using: ("user" =pressure
> mass matrix)
>
> <...>
> -pc_fieldsplit_schur_fact_type upper
> -pc_fieldsplit_schur_precondition user
> -fieldsplit_0_ksp_type preonly
> -fieldsplit_0_pc_type lu
> -fieldsplit_0_pc_factor_mat_solver_package mumps
>
> -fieldsplit_1_ksp_type gmres
> -fieldsplit_1_ksp_monitor_true_residuals
> -fieldsplit_1_ksp_rtol 1e-10
> -fieldsplit_1_pc_type lu
> -fieldsplit_1_pc_factor_mat_solver_package mumps
>
> It takes 2 outer iterations, as expected. However the fieldsplit_1 solve
> takes very long.
>
1) It should take 1 outer iterate, not two. The problem is that your Schur
tolerance is way too high. Use
-fieldsplit_1_ksp_rtol 1e-10
or something like that. Then it will take 1 iterate.
2) There is a problem with the Schur solve. Now from the iterates
423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm
7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11
it is clear that the preconditioner is really screwing stuff up. For
testing, you can use
-pc_fieldsplit_schur_precondition full
and your same setup here. It should take one iterate. I think there is
something wrong with your
mass matrix.
Thanks,
Matt
> 0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm
> 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00
> Residual norms for fieldsplit_1_ solve.
> 0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm
> 0.000000000000e+00 ||r(i)||/||b|| -nan
> Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations 0
> 1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm
> 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03
> Residual norms for fieldsplit_1_ solve.
> 0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm
> 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00
> 1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm
> 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01
> 2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm
> 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01
> 3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm
> 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01
> 4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm
> 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01
> 5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm
> 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01
> 6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm
> 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01
> 7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm
> 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02
> <...>
> 422 KSP preconditioned resid norm 2.984888715147e-02 true resid norm
> 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11
> 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm
> 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11
> Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 423
> 2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm
> 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14
> Linear solve converged due to CONVERGED_RTOL iterations 2
>
>
> Does the slow convergence of the Schur block mean that my preconditioning
> matrix Sp is a poor choice?
>
> Thanks,
> David
>
>
> On 06/11/2017 08:53 AM, Matthew Knepley wrote:
>
> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte <dnolte at dim.uchile.cl> wrote:
>
>> Dear all,
>>
>> I am solving a Stokes problem in 3D aorta geometries, using a P2/P1
>> finite elements discretization on tetrahedral meshes resulting in
>> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and
>> the right hand side is a function of noisy measurement data.
>>
>> In other settings of "standard" Stokes flow problems I have obtained
>> good convergence with an "upper" Schur complement preconditioner, using
>> AMG (ML or Hypre) on the velocity block and approximating the Schur
>> complement matrix by the diagonal of the pressure mass matrix:
>>
>> -ksp_converged_reason
>> -ksp_monitor_true_residual
>> -ksp_initial_guess_nonzero
>> -ksp_diagonal_scale
>> -ksp_diagonal_scale_fix
>> -ksp_type fgmres
>> -ksp_rtol 1.0e-8
>>
>> -pc_type fieldsplit
>> -pc_fieldsplit_type schur
>> -pc_fieldsplit_detect_saddle_point
>> -pc_fieldsplit_schur_fact_type upper
>> -pc_fieldsplit_schur_precondition user # <-- pressure mass matrix
>>
>> -fieldsplit_0_ksp_type preonly
>> -fieldsplit_0_pc_type ml
>>
>> -fieldsplit_1_ksp_type preonly
>> -fieldsplit_1_pc_type jacobi
>>
>
> 1) I always recommend starting from an exact solver and backing off in
> small steps for optimization. Thus
> I would start with LU on the upper block and GMRES/LU with toelrance
> 1e-10 on the Schur block.
> This should converge in 1 iterate.
>
> 2) I don't think you want preonly on the Schur system. You might want
> GMRES/Jacobi to invert the mass matrix.
>
> 3) You probably want to tighten the tolerance on the Schur solve, at least
> to start, and then slowly let it out. The
> tight tolerance will show you how effective the preconditioner is
> using that Schur operator. Then you can start
> to evaluate how effective the Schur linear sovler is.
>
> Does this make sense?
>
> Thanks,
>
> Matt
>
>
>> In my present case this setup gives rather slow convergence (varies for
>> different geometries between 200-500 or several thousands!). I obtain
>> better convergence with "-pc_fieldsplit_schur_precondition selfp"and
>> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think
>> this is optimal, though).
>>
>> I don't understand why the pressure mass matrix approach performs so
>> poorly and wonder what I could try to improve the convergence. Until now
>> I have been using ML and Hypre BoomerAMG mostly with default parameters.
>> Surely they can be improved by tuning some parameters. Which could be a
>> good starting point? Are there other options I should consider?
>>
>> With the above setup (jacobi) for a case that works better than others,
>> the KSP terminates with
>> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm
>> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09
>>
>> You can find the output of -ksp_view below. Let me know if you need more
>> details.
>>
>> Thanks in advance for your advice!
>> Best wishes
>> David
>>
>>
>> KSP Object: 1 MPI processes
>> type: fgmres
>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> GMRES: happy breakdown tolerance 1e-30
>> maximum iterations=10000
>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
>> right preconditioning
>> diagonally scaled system
>> using nonzero initial guess
>> using UNPRECONDITIONED norm type for convergence test
>> PC Object: 1 MPI processes
>> type: fieldsplit
>> FieldSplit with Schur preconditioner, factorization UPPER
>> Preconditioner for the Schur complement formed from user provided
>> matrix
>> Split info:
>> Split number 0 Defined by IS
>> Split number 1 Defined by IS
>> KSP solver for A00 block
>> KSP Object: (fieldsplit_0_) 1 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_0_) 1 MPI processes
>> type: ml
>> MG: type is MULTIPLICATIVE, levels=5 cycles=v
>> Cycles per PCApply=1
>> Using Galerkin computed coarse grid matrices
>> Coarse grid solver -- level -------------------------------
>> KSP Object: (fieldsplit_0_mg_coarse_) 1 MPI
>> processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_0_mg_coarse_) 1 MPI
>> processes
>> type: lu
>> LU: out-of-place factorization
>> tolerance for zero pivot 2.22045e-14
>> using diagonal shift on blocks to prevent zero pivot
>> [INBLOCKS]
>> matrix ordering: nd
>> factor fill ratio given 5., needed 1.
>> Factored matrix follows:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=3, cols=3
>> package used to perform factorization: petsc
>> total: nonzeros=3, allocated nonzeros=3
>> total number of mallocs used during MatSetValues
>> calls =0
>> not using I-node routines
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=3, cols=3
>> total: nonzeros=3, allocated nonzeros=3
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> Down solver (pre-smoother) on level 1
>> -------------------------------
>> KSP Object: (fieldsplit_0_mg_levels_1_) 1
>> MPI processes
>> type: richardson
>> Richardson: damping factor=1.
>> maximum iterations=2
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using nonzero initial guess
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_0_mg_levels_1_) 1
>> MPI processes
>> type: sor
>> SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=15, cols=15
>> total: nonzeros=69, allocated nonzeros=69
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> Up solver (post-smoother) same as down solver (pre-smoother)
>> Down solver (pre-smoother) on level 2
>> -------------------------------
>> KSP Object: (fieldsplit_0_mg_levels_2_) 1
>> MPI processes
>> type: richardson
>> Richardson: damping factor=1.
>> maximum iterations=2
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using nonzero initial guess
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_0_mg_levels_2_) 1
>> MPI processes
>> type: sor
>> SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=304, cols=304
>> total: nonzeros=7354, allocated nonzeros=7354
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> Up solver (post-smoother) same as down solver (pre-smoother)
>> Down solver (pre-smoother) on level 3
>> -------------------------------
>> KSP Object: (fieldsplit_0_mg_levels_3_) 1
>> MPI processes
>> type: richardson
>> Richardson: damping factor=1.
>> maximum iterations=2
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using nonzero initial guess
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_0_mg_levels_3_) 1
>> MPI processes
>> type: sor
>> SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=30236, cols=30236
>> total: nonzeros=2730644, allocated nonzeros=2730644
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> Up solver (post-smoother) same as down solver (pre-smoother)
>> Down solver (pre-smoother) on level 4
>> -------------------------------
>> KSP Object: (fieldsplit_0_mg_levels_4_) 1
>> MPI processes
>> type: richardson
>> Richardson: damping factor=1.
>> maximum iterations=2
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using nonzero initial guess
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_0_mg_levels_4_) 1
>> MPI processes
>> type: sor
>> SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> linear system matrix = precond matrix:
>> Mat Object: (fieldsplit_0_) 1 MPI
>> processes
>> type: seqaij
>> rows=894132, cols=894132
>> total: nonzeros=70684164, allocated nonzeros=70684164
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> Up solver (post-smoother) same as down solver (pre-smoother)
>> linear system matrix = precond matrix:
>> Mat Object: (fieldsplit_0_) 1 MPI processes
>> type: seqaij
>> rows=894132, cols=894132
>> total: nonzeros=70684164, allocated nonzeros=70684164
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> KSP solver for S = A11 - A10 inv(A00) A01
>> KSP Object: (fieldsplit_1_) 1 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_1_) 1 MPI processes
>> type: jacobi
>> linear system matrix followed by preconditioner matrix:
>> Mat Object: (fieldsplit_1_) 1 MPI processes
>> type: schurcomplement
>> rows=42025, cols=42025
>> Schur complement A11 - A10 inv(A00) A01
>> A11
>> Mat Object: (fieldsplit_1_) 1
>> MPI processes
>> type: seqaij
>> rows=42025, cols=42025
>> total: nonzeros=554063, allocated nonzeros=554063
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> A10
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=42025, cols=894132
>> total: nonzeros=6850107, allocated nonzeros=6850107
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> KSP of A00
>> KSP Object: (fieldsplit_0_) 1
>> MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (fieldsplit_0_) 1
>> MPI processes
>> type: ml
>> MG: type is MULTIPLICATIVE, levels=5 cycles=v
>> Cycles per PCApply=1
>> Using Galerkin computed coarse grid matrices
>> Coarse grid solver -- level ------------------------------
>> -
>> KSP Object:
>> (fieldsplit_0_mg_coarse_) 1 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object:
>> (fieldsplit_0_mg_coarse_) 1 MPI processes
>> type: lu
>> LU: out-of-place factorization
>> tolerance for zero pivot 2.22045e-14
>> using diagonal shift on blocks to prevent zero
>> pivot [INBLOCKS]
>> matrix ordering: nd
>> factor fill ratio given 5., needed 1.
>> Factored matrix follows:
>> Mat Object: 1 MPI
>> processes
>> type: seqaij
>> rows=3, cols=3
>> package used to perform factorization: petsc
>> total: nonzeros=3, allocated nonzeros=3
>> total number of mallocs used during
>> MatSetValues calls =0
>> not using I-node routines
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=3, cols=3
>> total: nonzeros=3, allocated nonzeros=3
>> total number of mallocs used during MatSetValues
>> calls =0
>> not using I-node routines
>> Down solver (pre-smoother) on level 1
>> -------------------------------
>> KSP Object:
>> (fieldsplit_0_mg_levels_1_) 1 MPI processes
>> type: richardson
>> Richardson: damping factor=1.
>> maximum iterations=2
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> left preconditioning
>> using nonzero initial guess
>> using NONE norm type for convergence test
>> PC Object:
>> (fieldsplit_0_mg_levels_1_) 1 MPI processes
>> type: sor
>> SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=15, cols=15
>> total: nonzeros=69, allocated nonzeros=69
>> total number of mallocs used during MatSetValues
>> calls =0
>> not using I-node routines
>> Up solver (post-smoother) same as down solver
>> (pre-smoother)
>> Down solver (pre-smoother) on level 2
>> -------------------------------
>> KSP Object:
>> (fieldsplit_0_mg_levels_2_) 1 MPI processes
>> type: richardson
>> Richardson: damping factor=1.
>> maximum iterations=2
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> left preconditioning
>> using nonzero initial guess
>> using NONE norm type for convergence test
>> PC Object:
>> (fieldsplit_0_mg_levels_2_) 1 MPI processes
>> type: sor
>> SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=304, cols=304
>> total: nonzeros=7354, allocated nonzeros=7354
>> total number of mallocs used during MatSetValues
>> calls =0
>> not using I-node routines
>> Up solver (post-smoother) same as down solver
>> (pre-smoother)
>> Down solver (pre-smoother) on level 3
>> -------------------------------
>> KSP Object:
>> (fieldsplit_0_mg_levels_3_) 1 MPI processes
>> type: richardson
>> Richardson: damping factor=1.
>> maximum iterations=2
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> left preconditioning
>> using nonzero initial guess
>> using NONE norm type for convergence test
>> PC Object:
>> (fieldsplit_0_mg_levels_3_) 1 MPI processes
>> type: sor
>> SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=30236, cols=30236
>> total: nonzeros=2730644, allocated nonzeros=2730644
>> total number of mallocs used during MatSetValues
>> calls =0
>> not using I-node routines
>> Up solver (post-smoother) same as down solver
>> (pre-smoother)
>> Down solver (pre-smoother) on level 4
>> -------------------------------
>> KSP Object:
>> (fieldsplit_0_mg_levels_4_) 1 MPI processes
>> type: richardson
>> Richardson: damping factor=1.
>> maximum iterations=2
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> left preconditioning
>> using nonzero initial guess
>> using NONE norm type for convergence test
>> PC Object:
>> (fieldsplit_0_mg_levels_4_) 1 MPI processes
>> type: sor
>> SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> linear system matrix = precond matrix:
>> Mat Object:
>> (fieldsplit_0_) 1 MPI processes
>> type: seqaij
>> rows=894132, cols=894132
>> total: nonzeros=70684164, allocated
>> nonzeros=70684164
>> total number of mallocs used during MatSetValues
>> calls =0
>> not using I-node routines
>> Up solver (post-smoother) same as down solver
>> (pre-smoother)
>> linear system matrix = precond matrix:
>> Mat Object:
>> (fieldsplit_0_) 1 MPI processes
>> type: seqaij
>> rows=894132, cols=894132
>> total: nonzeros=70684164, allocated nonzeros=70684164
>> total number of mallocs used during MatSetValues calls
>> =0
>> not using I-node routines
>> A01
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=894132, cols=42025
>> total: nonzeros=6850107, allocated nonzeros=6850107
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=42025, cols=42025
>> total: nonzeros=554063, allocated nonzeros=554063
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=936157, cols=936157
>> total: nonzeros=84938441, allocated nonzeros=84938441
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> http://www.caam.rice.edu/~mk51/
>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
http://www.caam.rice.edu/~mk51/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170612/a844e2ef/attachment-0001.html>
More information about the petsc-users
mailing list