[petsc-users] Advice on improving Stokes Schur preconditioners

Mon Jun 12 06:50:13 CDT 2017

On Sun, Jun 11, 2017 at 11:06 PM, David Nolte <dnolte at dim.uchile.cl> wrote:

> Thanks Matt, makes sense to me!
>
> I skipped direct solvers at first because for these 'real' configurations
> LU (mumps/superlu_dist) usally goes out of memory (got 32GB RAM). It would
> be reasonable to take one more step back and play with synthetic examples.
> I managed to run one case though with 936k dofs using: ("user" =pressure
> mass matrix)
>
> <...>
> -pc_fieldsplit_schur_fact_type upper
> -pc_fieldsplit_schur_precondition user
> -fieldsplit_0_ksp_type preonly
> -fieldsplit_0_pc_type lu
> -fieldsplit_0_pc_factor_mat_solver_package mumps
>
> -fieldsplit_1_ksp_type gmres
> -fieldsplit_1_ksp_monitor_true_residuals
> -fieldsplit_1_ksp_rtol 1e-10
> -fieldsplit_1_pc_type lu
> -fieldsplit_1_pc_factor_mat_solver_package mumps
>
> It takes 2 outer iterations, as expected. However the fieldsplit_1 solve
> takes very long.
>

1) It should take 1 outer iterate, not two. The problem is that your Schur
tolerance is way too high. Use

  -fieldsplit_1_ksp_rtol 1e-10

or something like that. Then it will take 1 iterate.

2) There is a problem with the Schur solve. Now from the iterates

423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm
7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11

it is clear that the preconditioner is really screwing stuff up. For
testing, you can use

  -pc_fieldsplit_schur_precondition full

and your same setup here. It should take one iterate. I think there is
something wrong with your
mass matrix.

  Thanks,

    Matt

>   0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm
> 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00
>     Residual norms for fieldsplit_1_ solve.
>     0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm
> 0.000000000000e+00 ||r(i)||/||b||           -nan
>   Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations 0
>   1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm
> 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03
>     Residual norms for fieldsplit_1_ solve.
>     0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm
> 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00
>     1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm
> 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01
>     2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm
> 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01
>     3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm
> 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01
>     4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm
> 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01
>     5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm
> 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01
>     6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm
> 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01
>     7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm
> 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02
>     <...>
>     422 KSP preconditioned resid norm 2.984888715147e-02 true resid norm
> 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11
>     423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm
> 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11
>   Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 423
>   2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm
> 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14
> Linear solve converged due to CONVERGED_RTOL iterations 2
>
>
> Does the slow convergence of the Schur block mean that my preconditioning
> matrix Sp is a poor choice?
>
> Thanks,
> David
>
>
> On 06/11/2017 08:53 AM, Matthew Knepley wrote:
>
> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte <dnolte at dim.uchile.cl> wrote:
>
>> Dear all,
>>
>> I am solving a Stokes problem in 3D aorta geometries, using a P2/P1
>> finite elements discretization on tetrahedral meshes resulting in
>> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and
>> the right hand side is a function of noisy measurement data.
>>
>> In other settings of "standard" Stokes flow problems I have obtained
>> good convergence with an "upper" Schur complement preconditioner, using
>> AMG (ML or Hypre) on the velocity block and approximating the Schur
>> complement matrix by the diagonal of the pressure mass matrix:
>>
>>     -ksp_converged_reason
>>     -ksp_monitor_true_residual
>>     -ksp_initial_guess_nonzero
>>     -ksp_diagonal_scale
>>     -ksp_diagonal_scale_fix
>>     -ksp_type fgmres
>>     -ksp_rtol 1.0e-8
>>
>>     -pc_type fieldsplit
>>     -pc_fieldsplit_type schur
>>     -pc_fieldsplit_detect_saddle_point
>>     -pc_fieldsplit_schur_fact_type upper
>>     -pc_fieldsplit_schur_precondition user    # <-- pressure mass matrix
>>
>>     -fieldsplit_0_ksp_type preonly
>>     -fieldsplit_0_pc_type ml
>>
>>     -fieldsplit_1_ksp_type preonly
>>     -fieldsplit_1_pc_type jacobi
>>
>
> 1) I always recommend starting from an exact solver and backing off in
> small steps for optimization. Thus
>     I would start with LU on the upper block and GMRES/LU with toelrance
> 1e-10 on the Schur block.
>     This should converge in 1 iterate.
>
> 2) I don't think you want preonly on the Schur system. You might want
> GMRES/Jacobi to invert the mass matrix.
>
> 3) You probably want to tighten the tolerance on the Schur solve, at least
> to start, and then slowly let it out. The
>     tight tolerance will show you how effective the preconditioner is
> using that Schur operator. Then you can start
>     to evaluate how effective the Schur linear sovler is.
>
> Does this make sense?
>
>   Thanks,
>
>      Matt
>
>
>> In my present case this setup gives rather slow convergence (varies for
>> different geometries between 200-500 or several thousands!). I obtain
>> better convergence with "-pc_fieldsplit_schur_precondition selfp"and
>> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think
>> this is optimal, though).
>>
>> I don't understand why the pressure mass matrix approach performs so
>> poorly and wonder what I could try to improve the convergence. Until now
>> I have been using ML and Hypre BoomerAMG mostly with default parameters.
>> Surely they can be improved by tuning some parameters. Which could be a
>> good starting point? Are there other options I should consider?
>>
>> With the above setup (jacobi) for a case that works better than others,
>> the KSP terminates with
>> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm
>> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09
>>
>> You can find the output of -ksp_view below. Let me know if you need more
>> details.
>>
>> Thanks in advance for your advice!
>> Best wishes
>> David
>>
>>
>> KSP Object: 1 MPI processes
>>   type: fgmres
>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>>     GMRES: happy breakdown tolerance 1e-30
>>   maximum iterations=10000
>>   tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
>>   right preconditioning
>>   diagonally scaled system
>>   using nonzero initial guess
>>   using UNPRECONDITIONED norm type for convergence test
>> PC Object: 1 MPI processes
>>   type: fieldsplit
>>     FieldSplit with Schur preconditioner, factorization UPPER
>>     Preconditioner for the Schur complement formed from user provided
>> matrix
>>     Split info:
>>     Split number 0 Defined by IS
>>     Split number 1 Defined by IS
>>     KSP solver for A00 block
>>       KSP Object:      (fieldsplit_0_)       1 MPI processes
>>         type: preonly
>>         maximum iterations=10000, initial guess is zero
>>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>         left preconditioning
>>         using NONE norm type for convergence test
>>       PC Object:      (fieldsplit_0_)       1 MPI processes
>>         type: ml
>>           MG: type is MULTIPLICATIVE, levels=5 cycles=v
>>             Cycles per PCApply=1
>>             Using Galerkin computed coarse grid matrices
>>         Coarse grid solver -- level -------------------------------
>>           KSP Object:          (fieldsplit_0_mg_coarse_)           1 MPI
>> processes
>>             type: preonly
>>             maximum iterations=10000, initial guess is zero
>>             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>             left preconditioning
>>             using NONE norm type for convergence test
>>           PC Object:          (fieldsplit_0_mg_coarse_)           1 MPI
>> processes
>>             type: lu
>>               LU: out-of-place factorization
>>               tolerance for zero pivot 2.22045e-14
>>               using diagonal shift on blocks to prevent zero pivot
>> [INBLOCKS]
>>               matrix ordering: nd
>>               factor fill ratio given 5., needed 1.
>>                 Factored matrix follows:
>>                   Mat Object:                   1 MPI processes
>>                     type: seqaij
>>                     rows=3, cols=3
>>                     package used to perform factorization: petsc
>>                     total: nonzeros=3, allocated nonzeros=3
>>                     total number of mallocs used during MatSetValues
>> calls =0
>>                       not using I-node routines
>>             linear system matrix = precond matrix:
>>             Mat Object:             1 MPI processes
>>               type: seqaij
>>               rows=3, cols=3
>>               total: nonzeros=3, allocated nonzeros=3
>>               total number of mallocs used during MatSetValues calls =0
>>                 not using I-node routines
>>         Down solver (pre-smoother) on level 1
>> -------------------------------
>>           KSP Object:          (fieldsplit_0_mg_levels_1_)           1
>> MPI processes
>>             type: richardson
>>               Richardson: damping factor=1.
>>             maximum iterations=2
>>             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>             left preconditioning
>>             using nonzero initial guess
>>             using NONE norm type for convergence test
>>           PC Object:          (fieldsplit_0_mg_levels_1_)           1
>> MPI processes
>>             type: sor
>>               SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>>             linear system matrix = precond matrix:
>>             Mat Object:             1 MPI processes
>>               type: seqaij
>>               rows=15, cols=15
>>               total: nonzeros=69, allocated nonzeros=69
>>               total number of mallocs used during MatSetValues calls =0
>>                 not using I-node routines
>>         Up solver (post-smoother) same as down solver (pre-smoother)
>>         Down solver (pre-smoother) on level 2
>> -------------------------------
>>           KSP Object:          (fieldsplit_0_mg_levels_2_)           1
>> MPI processes
>>             type: richardson
>>               Richardson: damping factor=1.
>>             maximum iterations=2
>>             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>             left preconditioning
>>             using nonzero initial guess
>>             using NONE norm type for convergence test
>>           PC Object:          (fieldsplit_0_mg_levels_2_)           1
>> MPI processes
>>             type: sor
>>               SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>>             linear system matrix = precond matrix:
>>             Mat Object:             1 MPI processes
>>               type: seqaij
>>               rows=304, cols=304
>>               total: nonzeros=7354, allocated nonzeros=7354
>>               total number of mallocs used during MatSetValues calls =0
>>                 not using I-node routines
>>         Up solver (post-smoother) same as down solver (pre-smoother)
>>         Down solver (pre-smoother) on level 3
>> -------------------------------
>>           KSP Object:          (fieldsplit_0_mg_levels_3_)           1
>> MPI processes
>>             type: richardson
>>               Richardson: damping factor=1.
>>             maximum iterations=2
>>             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>             left preconditioning
>>             using nonzero initial guess
>>             using NONE norm type for convergence test
>>           PC Object:          (fieldsplit_0_mg_levels_3_)           1
>> MPI processes
>>             type: sor
>>               SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>>             linear system matrix = precond matrix:
>>             Mat Object:             1 MPI processes
>>               type: seqaij
>>               rows=30236, cols=30236
>>               total: nonzeros=2730644, allocated nonzeros=2730644
>>               total number of mallocs used during MatSetValues calls =0
>>                 not using I-node routines
>>         Up solver (post-smoother) same as down solver (pre-smoother)
>>         Down solver (pre-smoother) on level 4
>> -------------------------------
>>           KSP Object:          (fieldsplit_0_mg_levels_4_)           1
>> MPI processes
>>             type: richardson
>>               Richardson: damping factor=1.
>>             maximum iterations=2
>>             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>             left preconditioning
>>             using nonzero initial guess
>>             using NONE norm type for convergence test
>>           PC Object:          (fieldsplit_0_mg_levels_4_)           1
>> MPI processes
>>             type: sor
>>               SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>>             linear system matrix = precond matrix:
>>             Mat Object:            (fieldsplit_0_)             1 MPI
>> processes
>>               type: seqaij
>>               rows=894132, cols=894132
>>               total: nonzeros=70684164, allocated nonzeros=70684164
>>               total number of mallocs used during MatSetValues calls =0
>>                 not using I-node routines
>>         Up solver (post-smoother) same as down solver (pre-smoother)
>>         linear system matrix = precond matrix:
>>         Mat Object:        (fieldsplit_0_)         1 MPI processes
>>           type: seqaij
>>           rows=894132, cols=894132
>>           total: nonzeros=70684164, allocated nonzeros=70684164
>>           total number of mallocs used during MatSetValues calls =0
>>             not using I-node routines
>>     KSP solver for S = A11 - A10 inv(A00) A01
>>       KSP Object:      (fieldsplit_1_)       1 MPI processes
>>         type: preonly
>>         maximum iterations=10000, initial guess is zero
>>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>         left preconditioning
>>         using NONE norm type for convergence test
>>       PC Object:      (fieldsplit_1_)       1 MPI processes
>>         type: jacobi
>>         linear system matrix followed by preconditioner matrix:
>>         Mat Object:        (fieldsplit_1_)         1 MPI processes
>>           type: schurcomplement
>>           rows=42025, cols=42025
>>             Schur complement A11 - A10 inv(A00) A01
>>             A11
>>               Mat Object:              (fieldsplit_1_)               1
>> MPI processes
>>                 type: seqaij
>>                 rows=42025, cols=42025
>>                 total: nonzeros=554063, allocated nonzeros=554063
>>                 total number of mallocs used during MatSetValues calls =0
>>                   not using I-node routines
>>             A10
>>               Mat Object:               1 MPI processes
>>                 type: seqaij
>>                 rows=42025, cols=894132
>>                 total: nonzeros=6850107, allocated nonzeros=6850107
>>                 total number of mallocs used during MatSetValues calls =0
>>                   not using I-node routines
>>             KSP of A00
>>               KSP Object:              (fieldsplit_0_)               1
>> MPI processes
>>                 type: preonly
>>                 maximum iterations=10000, initial guess is zero
>>                 tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>>                 left preconditioning
>>                 using NONE norm type for convergence test
>>               PC Object:              (fieldsplit_0_)               1
>> MPI processes
>>                 type: ml
>>                   MG: type is MULTIPLICATIVE, levels=5 cycles=v
>>                     Cycles per PCApply=1
>>                     Using Galerkin computed coarse grid matrices
>>                 Coarse grid solver -- level ------------------------------
>> -
>>                   KSP Object:
>> (fieldsplit_0_mg_coarse_)                   1 MPI processes
>>                     type: preonly
>>                     maximum iterations=10000, initial guess is zero
>>                     tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>>                     left preconditioning
>>                     using NONE norm type for convergence test
>>                   PC Object:
>> (fieldsplit_0_mg_coarse_)                   1 MPI processes
>>                     type: lu
>>                       LU: out-of-place factorization
>>                       tolerance for zero pivot 2.22045e-14
>>                       using diagonal shift on blocks to prevent zero
>> pivot [INBLOCKS]
>>                       matrix ordering: nd
>>                       factor fill ratio given 5., needed 1.
>>                         Factored matrix follows:
>>                           Mat Object:                           1 MPI
>> processes
>>                             type: seqaij
>>                             rows=3, cols=3
>>                             package used to perform factorization: petsc
>>                             total: nonzeros=3, allocated nonzeros=3
>>                             total number of mallocs used during
>> MatSetValues calls =0
>>                               not using I-node routines
>>                     linear system matrix = precond matrix:
>>                     Mat Object:                     1 MPI processes
>>                       type: seqaij
>>                       rows=3, cols=3
>>                       total: nonzeros=3, allocated nonzeros=3
>>                       total number of mallocs used during MatSetValues
>> calls =0
>>                         not using I-node routines
>>                 Down solver (pre-smoother) on level 1
>> -------------------------------
>>                   KSP Object:
>> (fieldsplit_0_mg_levels_1_)                   1 MPI processes
>>                     type: richardson
>>                       Richardson: damping factor=1.
>>                     maximum iterations=2
>>                     tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>>                     left preconditioning
>>                     using nonzero initial guess
>>                     using NONE norm type for convergence test
>>                   PC Object:
>> (fieldsplit_0_mg_levels_1_)                   1 MPI processes
>>                     type: sor
>>                       SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>>                     linear system matrix = precond matrix:
>>                     Mat Object:                     1 MPI processes
>>                       type: seqaij
>>                       rows=15, cols=15
>>                       total: nonzeros=69, allocated nonzeros=69
>>                       total number of mallocs used during MatSetValues
>> calls =0
>>                         not using I-node routines
>>                 Up solver (post-smoother) same as down solver
>> (pre-smoother)
>>                 Down solver (pre-smoother) on level 2
>> -------------------------------
>>                   KSP Object:
>> (fieldsplit_0_mg_levels_2_)                   1 MPI processes
>>                     type: richardson
>>                       Richardson: damping factor=1.
>>                     maximum iterations=2
>>                     tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>>                     left preconditioning
>>                     using nonzero initial guess
>>                     using NONE norm type for convergence test
>>                   PC Object:
>> (fieldsplit_0_mg_levels_2_)                   1 MPI processes
>>                     type: sor
>>                       SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>>                     linear system matrix = precond matrix:
>>                     Mat Object:                     1 MPI processes
>>                       type: seqaij
>>                       rows=304, cols=304
>>                       total: nonzeros=7354, allocated nonzeros=7354
>>                       total number of mallocs used during MatSetValues
>> calls =0
>>                         not using I-node routines
>>                 Up solver (post-smoother) same as down solver
>> (pre-smoother)
>>                 Down solver (pre-smoother) on level 3
>> -------------------------------
>>                   KSP Object:
>> (fieldsplit_0_mg_levels_3_)                   1 MPI processes
>>                     type: richardson
>>                       Richardson: damping factor=1.
>>                     maximum iterations=2
>>                     tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>>                     left preconditioning
>>                     using nonzero initial guess
>>                     using NONE norm type for convergence test
>>                   PC Object:
>> (fieldsplit_0_mg_levels_3_)                   1 MPI processes
>>                     type: sor
>>                       SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>>                     linear system matrix = precond matrix:
>>                     Mat Object:                     1 MPI processes
>>                       type: seqaij
>>                       rows=30236, cols=30236
>>                       total: nonzeros=2730644, allocated nonzeros=2730644
>>                       total number of mallocs used during MatSetValues
>> calls =0
>>                         not using I-node routines
>>                 Up solver (post-smoother) same as down solver
>> (pre-smoother)
>>                 Down solver (pre-smoother) on level 4
>> -------------------------------
>>                   KSP Object:
>> (fieldsplit_0_mg_levels_4_)                   1 MPI processes
>>                     type: richardson
>>                       Richardson: damping factor=1.
>>                     maximum iterations=2
>>                     tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>>                     left preconditioning
>>                     using nonzero initial guess
>>                     using NONE norm type for convergence test
>>                   PC Object:
>> (fieldsplit_0_mg_levels_4_)                   1 MPI processes
>>                     type: sor
>>                       SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>>                     linear system matrix = precond matrix:
>>                     Mat Object:
>> (fieldsplit_0_)                     1 MPI processes
>>                       type: seqaij
>>                       rows=894132, cols=894132
>>                       total: nonzeros=70684164, allocated
>> nonzeros=70684164
>>                       total number of mallocs used during MatSetValues
>> calls =0
>>                         not using I-node routines
>>                 Up solver (post-smoother) same as down solver
>> (pre-smoother)
>>                 linear system matrix = precond matrix:
>>                 Mat Object:
>> (fieldsplit_0_)                 1 MPI processes
>>                   type: seqaij
>>                   rows=894132, cols=894132
>>                   total: nonzeros=70684164, allocated nonzeros=70684164
>>                   total number of mallocs used during MatSetValues calls
>> =0
>>                     not using I-node routines
>>             A01
>>               Mat Object:               1 MPI processes
>>                 type: seqaij
>>                 rows=894132, cols=42025
>>                 total: nonzeros=6850107, allocated nonzeros=6850107
>>                 total number of mallocs used during MatSetValues calls =0
>>                   not using I-node routines
>>         Mat Object:         1 MPI processes
>>           type: seqaij
>>           rows=42025, cols=42025
>>           total: nonzeros=554063, allocated nonzeros=554063
>>           total number of mallocs used during MatSetValues calls =0
>>             not using I-node routines
>>   linear system matrix = precond matrix:
>>   Mat Object:   1 MPI processes
>>     type: seqaij
>>     rows=936157, cols=936157
>>     total: nonzeros=84938441, allocated nonzeros=84938441
>>     total number of mallocs used during MatSetValues calls =0
>>       not using I-node routines
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> http://www.caam.rice.edu/~mk51/
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

http://www.caam.rice.edu/~mk51/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170612/a844e2ef/attachment-0001.html>