<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<br>
<div class="moz-cite-prefix">On 06/14/2017 02:26 PM, Dave May wrote:<br>
</div>
<blockquote
cite="mid:CAJ98EDptgOtc6aSEKdbvDe=mS-9ijpCZ8mMgg=z+wD8--RyGGg@mail.gmail.com"
type="cite">
<div><br>
<div class="gmail_quote">
<div>On Wed, 14 Jun 2017 at 19:42, David Nolte <<a
moz-do-not-send="true" href="mailto:dnolte@dim.uchile.cl">dnolte@dim.uchile.cl</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Dave, thanks a lot
for your great answer and for sharing your experience. I
have a much clearer picture now. :)<br>
<br>
The experiments 3/ give the desired results for examples
of cavity flow. The (1/mu scaled) mass matrix seems OK.<br>
<br>
I followed your and Matt's recommendations, used a FULL
Schur factorization, LU in the 0th split, and gradually
relaxed the tolerance of GMRES/Jacobi in split 1 (observed
the gradual increase in outer iterations). Then I replaced
the split_0 LU with AMG (further increase of outer
iterations and iterations on the Schur complement). <br>
Doing so I converged to using hypre boomeramg (smooth_type
Euclid, strong_threshold 0.75) and 3 iterations of
GMRES/Jacobi on the Schur block, which gave the best
time-to-solution in my particular setup and convergence to
rtol=1e-8 within 60 outer iterations.<br>
In my cases, using GMRES in the 0th split (with rtol 1e-1
or 1e-2) instead of "preonly" did not help convergence (on
the contrary).<br>
<br>
I also repeated the experiments with
"-pc_fieldsplit_schur_precondition selfp", with hypre(ilu)
in split 0 and hypre in split 1, just to check, and
somewhat disappointingly ( ;-) ) the wall time is less
than half than when using gmres/Jac and Sp = mass matrix.<br>
I am aware that this says nothing about scaling and
robustness with respect to h-refinement...</div>
</blockquote>
<div><br>
</div>
<div>- selfp defines the schur pc as A10 inv(diag(A00)) A01.
This operator is not spectrally equivalent to S</div>
<div><br>
</div>
<div>- For split 0 did you use preonly-hypre(ilu)?</div>
<div><br>
</div>
<div>- For split 1 did you also use hypre(ilu) (you just wrote
hypre)?</div>
<div><br>
</div>
<div>- What was the iteration count for the saddle point
problem with hypre and selfp? Iterates will increase if you
refine the mesh and a cross over will occur at some
(unknown) resolution and the mass matrix variant will be
faster.</div>
</div>
</div>
</blockquote>
<br>
Ok, this makes sense.<br>
<br>
split 1 has hypre with the default smoother (Schwarz-smoothers), the
setup is:<br>
<br>
<tt> -pc_type fieldsplit</tt><tt><br>
</tt><tt> -pc_fieldsplit_type schur</tt><tt><br>
</tt><tt> -pc_fieldsplit_detect_saddle_point</tt><tt><br>
</tt><tt> -pc_fieldsplit_schur_fact_type full</tt><tt><br>
</tt><tt> -pc_fieldsplit_schur_precondition selfp</tt><tt><br>
</tt><tt><br>
</tt><tt> -fieldsplit_0_ksp_type richardson</tt><tt><br>
</tt><tt> -fieldsplit_0_ksp_max_it 1</tt><tt><br>
</tt><tt> -fieldsplit_0_pc_type hypre</tt><tt><br>
</tt><tt> -fieldsplit_0_pc_hypre_type boomeramg</tt><tt><br>
</tt><tt> -fieldsplit_0_pc_hypre_boomeramg_strong_threshold 0.75</tt><tt><br>
</tt><tt> -fieldsplit_0_pc_hypre_boomeramg_smooth_type Euclid</tt><tt><br>
</tt><tt> -fieldsplit_0_pc_hypre_boomeramg_eu_bj</tt><tt><br>
</tt><tt><br>
</tt><tt> -fieldsplit_1_ksp_type richardson</tt><tt><br>
</tt><tt> -fieldsplit_1_ksp_max_it 1</tt><tt><br>
</tt><tt> -fieldsplit_1_pc_type hypre</tt><tt><br>
</tt><tt> -fieldsplit_1_pc_hypre_type boomeramg</tt><tt><br>
</tt><tt> -fieldsplit_1_pc_hypre_boomeramg_strong_threshold 0.75</tt><br>
<br>
Iteration counts were in two different cases 90 and 113, while the
mass matrix variant (gmres/jacobi iterations on the Schur
complement) took 56 and 59.<br>
<br>
<br>
<blockquote
cite="mid:CAJ98EDptgOtc6aSEKdbvDe=mS-9ijpCZ8mMgg=z+wD8--RyGGg@mail.gmail.com"
type="cite">
<div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Would you agree that
these configurations "make sense"?</div>
</blockquote>
<div><br>
</div>
<div>If you want to weak scale, the configuration with the
mass matrix makes the most sense.</div>
<div><br>
</div>
<div>If you are only interested in solving many problems on
one mesh, then do what ever you can to make the solve time
as fast as possible - including using preconditioners
defined with non-spectrally equivalent operators :D</div>
<div><br>
</div>
</div>
</div>
</blockquote>
<br>
I see. That's exactly my case, many problems on one mesh (they are
generated from medical images with fixed resolution). The
hypre/selfp variant is 2-3x faster, so I'll just stick with that for
the moment and try tuning the hypre parameters.<br>
<br>
Thanks again!<br>
<br>
<br>
<br>
<blockquote
cite="mid:CAJ98EDptgOtc6aSEKdbvDe=mS-9ijpCZ8mMgg=z+wD8--RyGGg@mail.gmail.com"
type="cite">
<div>
<div class="gmail_quote">
<div>Thanks,</div>
<div> Dave</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><br>
Furthermore, maybe anyone has a hint where to start tuning
multigrid? So far hypre worked better than ML, but I have
not experimented much with the parameters.</div>
</blockquote>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><br>
<br>
<br>
Thanks again for your help!<br>
<br>
Best wishes,<br>
David</div>
<div bgcolor="#FFFFFF" text="#000000"><br>
<br>
<br>
<br>
<div class="m_-7814857284138451345moz-cite-prefix">On
06/12/2017 04:52 PM, Dave May wrote:<br>
</div>
<blockquote type="cite">
<div>I've been following the discussion and have a
couple of comments:
<div><br>
</div>
<div>1/ For the preconditioners that you are using
(Schur factorisation LDU, or upper block triangular
DU), the convergence properties (e.g. 1 iterate for
LDU and 2 iterates for DU) come from analysis
involving exact inverses of A_00 and S</div>
<div><br>
</div>
<div>Once you switch from using exact inverses of A_00
and S, you have to rely on spectral equivalence of
operators. That is fine, but the spectral
equivalence does not tell you how many iterates LDU
or DU will require to converge. What it does inform
you about is that if you have a spectrally
equivalent operator for A_00 and S (Schur
complement), then under mesh refinement, your
iteration count (whatever it was prior to
refinement) will not increase.<br>
</div>
<div><br>
</div>
<div>2/ Looking at your first set of options, I see
you have opted to use -fieldsplit_ksp_type preonly
(for both split 0 and 1). That is nice as it creates
a linear operator thus you don't need something like
FGMRES or GCR applied to the saddle point problem. </div>
<div><br>
</div>
<div>Your choice for Schur is fine in the sense that
the diagonal of M is spectrally equivalent to M, and
M is spectrally equivalent to S. Whether it is
"fine" in terms of the iteration count for Schur
systems, we cannot say apriori (since the spectral
equivalence doesn't give us direct info about the
iterations we should expect). </div>
<div><br>
</div>
<div>Your preconditioner for A_00 relies on AMG
producing a spectrally equivalent operator with
bounds which are tight enough to ensure convergence
of the saddle point problem. I'll try explain this.</div>
<div><br>
</div>
<div>In my experience, for many problems (unstructured
FE with variable coefficients, structured FE meshes
with variable coefficients) AMG and preonly is not a
robust choice. To control the approximation (the
spectral equiv bounds), I typically run a stationary
or Krylov method on split 0 (e.g.
-fieldsplit_0_ksp_type xxx -fieldsplit_0_kps_rtol
yyy). Since the AMG preconditioner generated is
spectrally equivalent (usually!), these solves will
converge to a chosen rtol in a constant number of
iterates under h-refinement. In practice, if I don't
enforce that I hit something like rtol=1.0e-1 (or
1.0e-2) on the 0th split, saddle point iterates will
typically increase for "hard" problems under mesh
refinement (1e4-1e7 coefficient variation), and may
not even converge at all when just using
-fieldsplit_0_ksp_type preonly. Failure ultimately
depends on how "strong" the preconditioner for A_00
block is (consider re-discretized geometric
multigrid versus AMG). Running an iterative solve on
the 0th split lets you control and recover from
weak/poor, but spectrally equivalent preconditioners
for A_00. Note that people hate this approach as it
invariably nests Krylov methods, and subsequently
adds more global reductions. However, it is
scalable, optimal, tuneable and converges faster
than the case which didn't converge at all :D</div>
<div><br>
</div>
<div>3/ I agree with Matt's comments, but I'd do a
couple of other things first.</div>
<div><br>
</div>
<div>* I'd first check the discretization is
implemented correctly. Your P2/P1 element is inf-sup
stable - thus the condition number of S
(unpreconditioned) should be independent of the mesh
resolution (h). An easy way to verify this is to run
either LDU (schur_fact_type full) or DU
(schur_fact_type upper) and monitor the iterations
required for those S solves. Use
-fieldsplit_1_pc_type none -fieldsplit_1_ksp_rtol
1.0e-8 -fieldsplit_1_ksp_monitor_true_residual
-fieldsplit_1_ksp_pc_right -fieldsplit_1_ksp_type
gmres -fieldsplit_0_pc_type lu</div>
<div><br>
</div>
<div>Then refine the mesh (ideally via sub-division)
and repeat the experiment.</div>
<div>If the S iterates don't asymptote, but instead
grow with each refinement - you likely have a
problem with the discretisation.</div>
<div><br>
</div>
<div>* Do the same experiment, but this time use your
mass matrix as the preconditioner for S and use
-fieldsplit_1_pc_type lu. If the iterates, compared
with the previous experiments (without a Schur PC)
have gone up your mass matrix is not defined
correctly. If in the previous experiment (without a
Schur PC) iterates on the S solves were bounded, but
now when preconditioned with the mass matrix the
iterates go up, then your mass matrix is definitely
not correct.</div>
<div><br>
</div>
<div>4/ Lastly, to finally get to your question
regarding does +400 iterates for the solving the
Schur seem "reasonable" and what is "normal
behaviour"? </div>
<div><br>
</div>
<div>It seems "high" to me. However the specifics of
your discretisation, mesh topology, element quality,
boundary conditions render it almost impossible to
say what should be expected. When I use a Q2-P2*
discretisation on a structured mesh with a
non-constant viscosity I'd expect something like
50-60 for 1.0e-10 with a mass matrix scaled by the
inverse (local) viscosity. For constant viscosity
maybe 30 iterates. I think this kind of statement is
not particularly useful or helpful though.</div>
<div><br>
</div>
<div>
<div>Given you use an unstructured tet mesh, it is
possible that some elements have very bad quality
(high aspect ratio (AR), highly skewed). I am
certain that P2/P1 has an inf-sup constant which
is sensitive to the element aspect ratio (I don't
recall the exact scaling wrt AR). From experience
I know that using the mass matrix as a
preconditioner for Schur is not robust as AR
increases (e.g. iterations for the S solve grow).
Hence, with a couple of "bad" element in your
mesh, I could imagine that you could end up having
to perform +400 iterations </div>
</div>
<div><br>
</div>
<div>5/ Lastly, definitely don't impose one Dirichlet
BC on pressure to make the pressure unique. This
really screws up all the nice properties of your
matrices. Just enforce the constant null space for
p. And as you noticed, GMRES magically just does it
automatically if the RHS of your original system was
consistent.</div>
<div> </div>
<div>Thanks,</div>
<div> Dave</div>
<div><br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 12 June 2017 at 20:20,
David Nolte <span><<a moz-do-not-send="true"
href="mailto:dnolte@dim.uchile.cl"
target="_blank">dnolte@dim.uchile.cl</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF"> Ok. With <tt>"-pc_fieldsplit_schur_fact_type
full" </tt>the outer iteration converges in
1 step. The problem remain the Schur
iterations.<br>
<br>
I was not sure if the problem was maybe the
singular pressure or the pressure Dirichlet
BC. I tested the solver with a standard Stokes
flow in a pipe with a constriction (zero
Neumann BC for the pressure at the outlet) and
in a 3D cavity (enclosed flow, no pressure BC
or fixed at one point). I am not sure if I
need to attach the constant pressure nullspace
to the matrix for GMRES. Not doing so does not
alter the convergence of GMRES in the Schur
solver (nor the pressure solution), using a
pressure Dirichlet BC however slows down
convergence (I suppose because of the scaling
of the matrix).<br>
<br>
I also checked the pressure mass matrix that I
give PETSc, it looks correct.<br>
<br>
In all these cases, the solver behaves just as
before. With LU in fieldsplit_0 and GMRES/LU
with rtol 1e-10 in fieldsplit_1, it converges
after 1 outer iteration, but the inner Schur
solver converges slowly. <br>
<br>
How should the convergence of GMRES/LU of the
Schur complement *normally* behave?<br>
<br>
Thanks again!<span
class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114HOEnZb"><font
color="#888888"><br>
David</font></span>
<div>
<div
class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114h5"><br>
<br>
<br>
<br>
<div
class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755moz-cite-prefix">On
06/12/2017 12:41 PM, Matthew Knepley
wrote:<br>
</div>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">On Mon, Jun
12, 2017 at 10:36 AM, David Nolte
<span><<a
moz-do-not-send="true"
href="mailto:dnolte@dim.uchile.cl"
target="_blank">dnolte@dim.uchile.cl</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF"> <br>
<div
class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171moz-cite-prefix">On
06/12/2017 07:50 AM, Matthew
Knepley wrote:<br>
</div>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">On
Sun, Jun 11, 2017 at
11:06 PM, David Nolte
<span><<a
moz-do-not-send="true"
href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF">
Thanks Matt, makes
sense to me!<br>
<br>
I skipped direct
solvers at first
because for these
'real'
configurations LU
(mumps/superlu_dist) usally goes out of memory (got 32GB RAM). It would
be reasonable to
take one more step
back and play with
synthetic
examples.<br>
I managed to run
one case though
with 936k dofs
using: ("user"
=pressure mass
matrix)<br>
<br>
<tt><...><br>
-pc_fieldsplit_schur_fact_type upper</tt><tt><br>
</tt><tt>-pc_fieldsplit_schur_precondition
user</tt><tt><br>
</tt><tt>-fieldsplit_0_ksp_type
preonly </tt><tt><br>
</tt><tt>-fieldsplit_0_pc_type
lu</tt><tt><br>
</tt><tt>-fieldsplit_0_pc_factor_mat_solver_package
mumps</tt><tt><br>
</tt><tt><br>
</tt><tt>
-fieldsplit_1_ksp_type
gmres<br>
-fieldsplit_1_ksp_monitor_true_residuals<br>
-fieldsplit_1_ksp_rtol 1e-10<br>
</tt><tt>-fieldsplit_1_pc_type
lu</tt><tt><br>
</tt><tt>
-fieldsplit_1_pc_factor_mat_solver_package
mumps</tt><tt><br>
</tt><br>
It takes 2 outer
iterations, as
expected. However
the fieldsplit_1
solve takes very
long.<br>
</div>
</blockquote>
<div><br>
</div>
<div>1) It should take
1 outer iterate, not
two. The problem is
that your Schur
tolerance is way too
high. Use</div>
<div><br>
</div>
<div>
-fieldsplit_1_ksp_rtol
1e-10</div>
<div><br>
</div>
<div>or something like
that. Then it will
take 1 iterate.</div>
</div>
</div>
</div>
</blockquote>
<br>
Shouldn't it take 2 with a
triangular Schur factorization
and exact preconditioners, and
1 with a full factorization?
(cf. Benzi et al 2005, p.66, <a
moz-do-not-send="true"
class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171moz-txt-link-freetext"
href="http://www.mathcs.emory.edu/%7Ebenzi/Web_papers/bgl05.pdf"
target="_blank">http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf</a>)<br>
<br>
That's exactly what I set: <tt>
-fieldsplit_1_ksp_rtol 1e-10
</tt>and the Schur solver does
drop below "rtol < 1e-10"<br>
</div>
</blockquote>
<div><br>
</div>
<div>Oh, yes. Take away the upper
until things are worked out.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF">
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>2) There is a
problem with the
Schur solve. Now
from the iterates</div>
<div><br>
</div>
<div><span
style="font-family:monospace">423
KSP preconditioned
resid norm
2.638419658982e-02
true resid norm
7.229653211635e-11
||r(i)||/||b||
7.229653211635e-11</span><br>
</div>
<div><br>
</div>
<div>it is clear that
the preconditioner
is really screwing
stuff up. For
testing, you can use</div>
<div><br>
</div>
<div>
-pc_fieldsplit_schur_precondition
full</div>
<div><br>
</div>
<div>and your same
setup here. It
should take one
iterate. I think
there is something
wrong with your</div>
<div>mass matrix.</div>
</div>
</div>
</div>
</blockquote>
<br>
I agree. I forgot to mention
that I am considering an
"enclosed flow" problem, with
u=0 on all the boundary and a
Dirichlet condition for the
pressure in one point for
fixing the constant pressure.
Maybe the preconditioner is
not consistent with this
setup, need to check this..<br>
<br>
Thanks a lot<br>
<br>
<br>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div><br>
</div>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF">
<br>
<tt> 0 KSP
unpreconditioned
resid norm
4.038466809302e-03
true resid norm
4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00</tt><tt><br>
</tt><tt>
Residual norms
for
fieldsplit_1_
solve.</tt><tt><br>
</tt><tt> 0 KSP
preconditioned
resid norm
0.000000000000e+00
true resid norm
0.000000000000e+00 ||r(i)||/||b|| -nan</tt><tt><br>
</tt><tt> Linear
fieldsplit_1_
solve converged
due to
CONVERGED_ATOL
iterations 0</tt><tt><br>
</tt><tt> 1 KSP
unpreconditioned
resid norm
4.860095964831e-06
true resid norm
4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03</tt><tt><br>
</tt><tt>
Residual norms
for
fieldsplit_1_
solve.</tt><tt><br>
</tt><tt> 0 KSP
preconditioned
resid norm
2.965546249872e+08
true resid norm
1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00</tt><tt><br>
</tt><tt> 1 KSP
preconditioned
resid norm
1.347596594634e+08
true resid norm
3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01</tt><tt><br>
</tt><tt> 2 KSP
preconditioned
resid norm
5.913230136403e+07
true resid norm
2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01</tt><tt><br>
</tt><tt> 3 KSP
preconditioned
resid norm
4.629700028930e+07
true resid norm
1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01</tt><tt><br>
</tt><tt> 4 KSP
preconditioned
resid norm
3.804431276819e+07
true resid norm
1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01</tt><tt><br>
</tt><tt> 5 KSP
preconditioned
resid norm
3.178769422140e+07
true resid norm
1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01</tt><tt><br>
</tt><tt> 6 KSP
preconditioned
resid norm
2.648669043919e+07
true resid norm
1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01</tt><tt><br>
</tt><tt> 7 KSP
preconditioned
resid norm
2.203522108614e+07
true resid norm
9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02</tt><tt><br>
<...><br>
422 KSP
preconditioned
resid norm
2.984888715147e-02
true resid norm
8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11<br>
423 KSP
preconditioned
resid norm
2.638419658982e-02
true resid norm
7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11<br>
Linear
fieldsplit_1_
solve converged
due to
CONVERGED_RTOL
iterations 423<br>
2 KSP
unpreconditioned
resid norm
3.539889585599e-16
true resid norm
3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14<br>
Linear solve
converged due to
CONVERGED_RTOL
iterations 2<br>
</tt><tt><br>
</tt><br>
Does the slow
convergence of the
Schur block mean
that my
preconditioning
matrix Sp is a
poor choice?<br>
<br>
Thanks,<br>
David<br>
<br>
<br>
<div
class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171gmail-m_5328507656823621836moz-cite-prefix">On
06/11/2017 08:53
AM, Matthew
Knepley wrote:<br>
</div>
<blockquote
type="cite">
<div>
<div
class="gmail_extra">
<div
class="gmail_quote">On
Sat, Jun 10,
2017 at 8:25
PM, David
Nolte <span><<a
moz-do-not-send="true" href="mailto:dnolte@dim.uchile.cl"
target="_blank">dnolte@dim.uchile.cl</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Dear
all,<br>
<br>
I am solving a
Stokes problem
in 3D aorta
geometries,
using a P2/P1<br>
finite
elements
discretization
on tetrahedral
meshes
resulting in<br>
~1-1.5M DOFs.
Viscosity is
uniform (can
be adjusted
arbitrarily),
and<br>
the right hand
side is a
function of
noisy
measurement
data.<br>
<br>
In other
settings of
"standard"
Stokes flow
problems I
have obtained<br>
good
convergence
with an
"upper" Schur
complement
preconditioner,
using<br>
AMG (ML or
Hypre) on the
velocity block
and
approximating
the Schur<br>
complement
matrix by the
diagonal of
the pressure
mass matrix:<br>
<br>
-ksp_converged_reason<br>
-ksp_monitor_true_residual<br>
-ksp_initial_guess_nonzero<br>
-ksp_diagonal_scale<br>
-ksp_diagonal_scale_fix<br>
-ksp_type
fgmres<br>
-ksp_rtol
1.0e-8<br>
<br>
-pc_type
fieldsplit<br>
-pc_fieldsplit_type
schur<br>
-pc_fieldsplit_detect_saddle_point<br>
-pc_fieldsplit_schur_fact_type
upper<br>
-pc_fieldsplit_schur_precondition
user #
<--
pressure mass
matrix<br>
<br>
-fieldsplit_0_ksp_type
preonly<br>
-fieldsplit_0_pc_type
ml<br>
<br>
-fieldsplit_1_ksp_type
preonly<br>
-fieldsplit_1_pc_type
jacobi<br>
</blockquote>
<div><br>
</div>
<div>1) I
always
recommend
starting from
an exact
solver and
backing off in
small steps
for
optimization.
Thus</div>
<div> I
would start
with LU on the
upper block
and GMRES/LU
with toelrance
1e-10 on the
Schur block.</div>
<div> This
should
converge in 1
iterate.</div>
<div><br>
</div>
<div>2) I
don't think
you want
preonly on the
Schur system.
You might want
GMRES/Jacobi
to invert the
mass matrix.</div>
<div><br>
</div>
<div>3) You
probably want
to tighten the
tolerance on
the Schur
solve, at
least to
start, and
then slowly
let it out.
The</div>
<div> tight
tolerance will
show you how
effective the
preconditioner
is using that
Schur
operator. Then
you can start</div>
<div> to
evaluate how
effective the
Schur linear
sovler is.</div>
<div><br>
</div>
<div>Does this
make sense?</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
In my present
case this
setup gives
rather slow
convergence
(varies for<br>
different
geometries
between
200-500 or
several
thousands!). I
obtain<br>
better
convergence
with
"-pc_fieldsplit_schur_precondition
selfp"and<br>
using
multigrid on
S, with
"-fieldsplit_1_pc_type
ml" (I don't
think<br>
this is
optimal,
though).<br>
<br>
I don't
understand why
the pressure
mass matrix
approach
performs so<br>
poorly and
wonder what I
could try to
improve the
convergence.
Until now<br>
I have been
using ML and
Hypre
BoomerAMG
mostly with
default
parameters.<br>
Surely they
can be
improved by
tuning some
parameters.
Which could be
a<br>
good starting
point? Are
there other
options I
should
consider?<br>
<br>
With the above
setup (jacobi)
for a case
that works
better than
others,<br>
the KSP
terminates
with<br>
467 KSP
unpreconditioned
resid norm
2.072014323515e-09
true resid
norm<br>
2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09<br>
<br>
You can find
the output of
-ksp_view
below. Let me
know if you
need more<br>
details.<br>
<br>
Thanks in
advance for
your advice!<br>
Best wishes<br>
David<br>
<br>
<br>
KSP Object: 1
MPI processes<br>
type: fgmres<br>
GMRES:
restart=30,
using
Classical
(unmodified)
Gram-Schmidt<br>
Orthogonalization with no iterative refinement<br>
GMRES:
happy
breakdown
tolerance
1e-30<br>
maximum
iterations=10000<br>
tolerances:
relative=1e-08, absolute=1e-50, divergence=10000.<br>
right
preconditioning<br>
diagonally
scaled system<br>
using
nonzero
initial guess<br>
using
UNPRECONDITIONED
norm type for
convergence
test<br>
PC Object: 1
MPI processes<br>
type:
fieldsplit<br>
FieldSplit
with Schur
preconditioner,
factorization
UPPER<br>
Preconditioner
for the Schur
complement
formed from
user provided
matrix<br>
Split
info:<br>
Split
number 0
Defined by IS<br>
Split
number 1
Defined by IS<br>
KSP solver
for A00 block<br>
KSP
Object:
(fieldsplit_0_)
1 MPI
processes<br>
type:
preonly<br>
maximum
iterations=10000,
initial guess
is zero<br>
tolerances:
relative=1e-05,
absolute=1e-50, divergence=10000.<br>
left
preconditioning<br>
using
NONE norm type
for
convergence
test<br>
PC
Object:
(fieldsplit_0_)
1 MPI
processes<br>
type:
ml<br>
MG:
type is
MULTIPLICATIVE,
levels=5
cycles=v<br>
Cycles per
PCApply=1<br>
Using Galerkin
computed
coarse grid
matrices<br>
Coarse
grid solver --
level
-------------------------------<br>
KSP
Object:
(fieldsplit_0_mg_coarse_)
1 MPI<br>
processes<br>
type: preonly<br>
maximum
iterations=10000,
initial guess
is zero<br>
tolerances:
relative=1e-05,
absolute=1e-50, divergence=10000.<br>
left
preconditioning<br>
using NONE
norm type for
convergence
test<br>
PC
Object:
(fieldsplit_0_mg_coarse_)
1 MPI<br>
processes<br>
type: lu<br>
LU:
out-of-place
factorization<br>
tolerance for
zero pivot
2.22045e-14<br>
using diagonal
shift on
blocks to
prevent zero
pivot<br>
[INBLOCKS]<br>
matrix
ordering: nd<br>
factor fill
ratio given
5., needed 1.<br>
Factored
matrix
follows:<br>
Mat
Object:
1
MPI processes<br>
type:
seqaij<br>
rows=3,
cols=3<br>
package
used to
perform
factorization:
petsc<br>
total:
nonzeros=3,
allocated
nonzeros=3<br>
total
number of
mallocs used
during
MatSetValues<br>
calls =0<br>
not
using I-node
routines<br>
linear system
matrix =
precond
matrix:<br>
Mat Object:
1 MPI
processes<br>
type: seqaij<br>
rows=3, cols=3<br>
total:
nonzeros=3,
allocated
nonzeros=3<br>
total number
of mallocs
used during
MatSetValues
calls =0<br>
not using
I-node
routines<br>
Down
solver
(pre-smoother)
on level 1<br>
-------------------------------<br>
KSP
Object:
(fieldsplit_0_mg_levels_1_)
1<br>
MPI processes<br>
type:
richardson<br>
Richardson:
damping
factor=1.<br>
maximum
iterations=2<br>
tolerances:
relative=1e-05,
absolute=1e-50, divergence=10000.<br>
left
preconditioning<br>
using nonzero
initial guess<br>
using NONE
norm type for
convergence
test<br>
PC
Object:
(fieldsplit_0_mg_levels_1_)
1<br>
MPI processes<br>
type: sor<br>
SOR: type =
local_symmetric,
iterations =
1, local<br>
iterations =
1, omega = 1.<br>
linear system
matrix =
precond
matrix:<br>
Mat Object:
1 MPI
processes<br>
type: seqaij<br>
rows=15,
cols=15<br>
total:
nonzeros=69,
allocated
nonzeros=69<br>
total number
of mallocs
used during
MatSetValues
calls =0<br>
not using
I-node
routines<br>
Up
solver
(post-smoother)
same as down
solver
(pre-smoother)<br>
Down
solver
(pre-smoother)
on level 2<br>
-------------------------------<br>
KSP
Object:
(fieldsplit_0_mg_levels_2_)
1<br>
MPI processes<br>
type:
richardson<br>
Richardson:
damping
factor=1.<br>
maximum
iterations=2<br>
tolerances:
relative=1e-05,
absolute=1e-50, divergence=10000.<br>
left
preconditioning<br>
using nonzero
initial guess<br>
using NONE
norm type for
convergence
test<br>
PC
Object:
(fieldsplit_0_mg_levels_2_)
1<br>
MPI processes<br>
type: sor<br>
SOR: type =
local_symmetric,
iterations =
1, local<br>
iterations =
1, omega = 1.<br>
linear system
matrix =
precond
matrix:<br>
Mat Object:
1 MPI
processes<br>
type: seqaij<br>
rows=304,
cols=304<br>
total:
nonzeros=7354,
allocated
nonzeros=7354<br>
total number
of mallocs
used during
MatSetValues
calls =0<br>
not using
I-node
routines<br>
Up
solver
(post-smoother)
same as down
solver
(pre-smoother)<br>
Down
solver
(pre-smoother)
on level 3<br>
-------------------------------<br>
KSP
Object:
(fieldsplit_0_mg_levels_3_)
1<br>
MPI processes<br>
type:
richardson<br>
Richardson:
damping
factor=1.<br>
maximum
iterations=2<br>
tolerances:
relative=1e-05,
absolute=1e-50, divergence=10000.<br>
left
preconditioning<br>
using nonzero
initial guess<br>
using NONE
norm type for
convergence
test<br>
PC
Object:
(fieldsplit_0_mg_levels_3_)
1<br>
MPI processes<br>
type: sor<br>
SOR: type =
local_symmetric,
iterations =
1, local<br>
iterations =
1, omega = 1.<br>
linear system
matrix =
precond
matrix:<br>
Mat Object:
1 MPI
processes<br>
type: seqaij<br>
rows=30236,
cols=30236<br>
total:
nonzeros=2730644,
allocated
nonzeros=2730644<br>
total number
of mallocs
used during
MatSetValues
calls =0<br>
not using
I-node
routines<br>
Up
solver
(post-smoother)
same as down
solver
(pre-smoother)<br>
Down
solver
(pre-smoother)
on level 4<br>
-------------------------------<br>
KSP
Object:
(fieldsplit_0_mg_levels_4_)
1<br>
MPI processes<br>
type:
richardson<br>
Richardson:
damping
factor=1.<br>
maximum
iterations=2<br>
tolerances:
relative=1e-05,
absolute=1e-50, divergence=10000.<br>
left
preconditioning<br>
using nonzero
initial guess<br>
using NONE
norm type for
convergence
test<br>
PC
Object:
(fieldsplit_0_mg_levels_4_)
1<br>
MPI processes<br>
type: sor<br>
SOR: type =
local_symmetric,
iterations =
1, local<br>
iterations =
1, omega = 1.<br>
linear system
matrix =
precond
matrix:<br>
Mat Object:
(fieldsplit_0_)
1
MPI<br>
processes<br>
type: seqaij<br>
rows=894132,
cols=894132<br>
total:
nonzeros=70684164,
allocated
nonzeros=70684164<br>
total number
of mallocs
used during
MatSetValues
calls =0<br>
not using
I-node
routines<br>
Up
solver
(post-smoother)
same as down
solver
(pre-smoother)<br>
linear
system matrix
= precond
matrix:<br>
Mat
Object:
(fieldsplit_0_) 1 MPI processes<br>
type: seqaij<br>
rows=894132,
cols=894132<br>
total:
nonzeros=70684164,
allocated
nonzeros=70684164<br>
total number
of mallocs
used during
MatSetValues
calls =0<br>
not using
I-node
routines<br>
KSP solver
for S = A11 -
A10 inv(A00)
A01<br>
KSP
Object:
(fieldsplit_1_)
1 MPI
processes<br>
type:
preonly<br>
maximum
iterations=10000,
initial guess
is zero<br>
tolerances:
relative=1e-05,
absolute=1e-50, divergence=10000.<br>
left
preconditioning<br>
using
NONE norm type
for
convergence
test<br>
PC
Object:
(fieldsplit_1_)
1 MPI
processes<br>
type:
jacobi<br>
linear
system matrix
followed by
preconditioner
matrix:<br>
Mat
Object:
(fieldsplit_1_) 1 MPI processes<br>
type:
schurcomplement<br>
rows=42025,
cols=42025<br>
Schur
complement A11
- A10 inv(A00)
A01<br>
A11<br>
Mat Object:
(fieldsplit_1_)
1<br>
MPI processes<br>
type: seqaij<br>
rows=42025,
cols=42025<br>
total:
nonzeros=554063,
allocated
nonzeros=554063<br>
total number
of mallocs
used during
MatSetValues
calls =0<br>
not using
I-node
routines<br>
A10<br>
Mat Object:
1
MPI processes<br>
type: seqaij<br>
rows=42025,
cols=894132<br>
total:
nonzeros=6850107,
allocated
nonzeros=6850107<br>
total number
of mallocs
used during
MatSetValues
calls =0<br>
not using
I-node
routines<br>
KSP of A00<br>
KSP Object:
(fieldsplit_0_)
1<br>
MPI processes<br>
type:
preonly<br>
maximum
iterations=10000,
initial guess
is zero<br>
tolerances:
relative=1e-05, absolute=1e-50,<br>
divergence=10000.<br>
left
preconditioning<br>
using NONE
norm type for
convergence
test<br>
PC Object:
(fieldsplit_0_)
1<br>
MPI processes<br>
type: ml<br>
MG: type
is
MULTIPLICATIVE,
levels=5
cycles=v<br>
Cycles
per PCApply=1<br>
Using
Galerkin
computed
coarse grid
matrices<br>
Coarse grid
solver --
level
-------------------------------<br>
KSP
Object:<br>
(fieldsplit_0_mg_coarse_) 1 MPI processes<br>
type:
preonly<br>
maximum
iterations=10000, initial guess is zero<br>
tolerances:
relative=1e-05,
absolute=1e-50,<br>
divergence=10000.<br>
left
preconditioning<br>
using
NONE norm type
for
convergence
test<br>
PC Object:<br>
(fieldsplit_0_mg_coarse_) 1 MPI processes<br>
type: lu<br>
LU:
out-of-place
factorization<br>
tolerance for
zero pivot
2.22045e-14<br>
using
diagonal shift
on blocks to
prevent zero<br>
pivot
[INBLOCKS]<br>
matrix
ordering: nd<br>
factor
fill ratio
given 5.,
needed 1.<br>
Factored
matrix
follows:<br>
Mat Object:
1 MPI<br>
processes<br>
type: seqaij<br>
rows=3, cols=3<br>
package used
to perform
factorization:
petsc<br>
total:
nonzeros=3,
allocated
nonzeros=3<br>
total number
of mallocs
used during<br>
MatSetValues
calls =0<br>
not using
I-node
routines<br>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
<br>
</body>
</html>