<div><br><div class="gmail_quote"><div>On Wed, 14 Jun 2017 at 19:42, David Nolte <<a href="mailto:dnolte@dim.uchile.cl">dnolte@dim.uchile.cl</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Dave, thanks a lot for your great answer and for sharing your
experience. I have a much clearer picture now. :)<br>
<br>
The experiments 3/ give the desired results for examples of cavity
flow. The (1/mu scaled) mass matrix seems OK.<br>
<br>
I followed your and Matt's recommendations, used a FULL Schur
factorization, LU in the 0th split, and gradually relaxed the
tolerance of GMRES/Jacobi in split 1 (observed the gradual increase
in outer iterations). Then I replaced the split_0 LU with AMG
(further increase of outer iterations and iterations on the Schur
complement). <br>
Doing so I converged to using hypre boomeramg (smooth_type Euclid,
strong_threshold 0.75) and 3 iterations of GMRES/Jacobi on the Schur
block, which gave the best time-to-solution in my particular setup
and convergence to rtol=1e-8 within 60 outer iterations.<br>
In my cases, using GMRES in the 0th split (with rtol 1e-1 or 1e-2)
instead of "preonly" did not help convergence (on the contrary).<br>
<br>
I also repeated the experiments with
"-pc_fieldsplit_schur_precondition selfp", with hypre(ilu) in split
0 and hypre in split 1, just to check, and somewhat disappointingly
( ;-) ) the wall time is less than half than when using gmres/Jac
and Sp = mass matrix.<br>
I am aware that this says nothing about scaling and robustness with
respect to h-refinement...</div></blockquote><div><br></div><div>- selfp defines the schur pc as A10 inv(diag(A00)) A01. This operator is not spectrally equivalent to S</div><div><br></div><div>- For split 0 did you use preonly-hypre(ilu)?</div><div><br></div><div>- For split 1 did you also use hypre(ilu) (you just wrote hypre)?</div><div><br></div><div>- What was the iteration count for the saddle point problem with hypre and selfp? Iterates will increase if you refine the mesh and a cross over will occur at some (unknown) resolution and the mass matrix variant will be faster.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>
<br>
Would you agree that these configurations "make sense"?</div></blockquote><div><br></div><div>If you want to weak scale, the configuration with the mass matrix makes the most sense.</div><div><br></div><div>If you are only interested in solving many problems on one mesh, then do what ever you can to make the solve time as fast as possible - including using preconditioners defined with non-spectrally equivalent operators :D</div><div><br></div><div>Thanks,</div><div> Dave</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>
Furthermore, maybe anyone has a hint where to start tuning
multigrid? So far hypre worked better than ML, but I have not
experimented much with the parameters.</div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>
<br>
<br>
Thanks again for your help!<br>
<br>
Best wishes,<br>
David</div><div bgcolor="#FFFFFF" text="#000000"><br>
<br>
<br>
<br>
<div class="m_-7814857284138451345moz-cite-prefix">On 06/12/2017 04:52 PM, Dave May wrote:<br>
</div>
<blockquote type="cite">
<div>I've been following the discussion and have a
couple of comments:
<div><br>
</div>
<div>1/ For the preconditioners that you are using (Schur
factorisation LDU, or upper block triangular DU), the
convergence properties (e.g. 1 iterate for LDU and 2 iterates
for DU) come from analysis involving exact inverses of A_00
and S</div>
<div><br>
</div>
<div>Once you switch from using exact inverses of A_00 and S,
you have to rely on spectral equivalence of operators. That is
fine, but the spectral equivalence does not tell you how many
iterates LDU or DU will require to converge. What it does
inform you about is that if you have a spectrally equivalent
operator for A_00 and S (Schur complement), then under mesh
refinement, your iteration count (whatever it was prior to
refinement) will not increase.<br>
</div>
<div><br>
</div>
<div>2/ Looking at your first set of options, I see you have
opted to use -fieldsplit_ksp_type preonly (for both split 0
and 1). That is nice as it creates a linear operator thus you
don't need something like FGMRES or GCR applied to the saddle
point problem. </div>
<div><br>
</div>
<div>Your choice for Schur is fine in the sense that the
diagonal of M is spectrally equivalent to M, and M is
spectrally equivalent to S. Whether it is "fine" in terms of
the iteration count for Schur systems, we cannot say apriori
(since the spectral equivalence doesn't give us direct info
about the iterations we should expect). </div>
<div><br>
</div>
<div>Your preconditioner for A_00 relies on AMG producing a
spectrally equivalent operator with bounds which are tight
enough to ensure convergence of the saddle point problem. I'll
try explain this.</div>
<div><br>
</div>
<div>In my experience, for many problems (unstructured FE with
variable coefficients, structured FE meshes with variable
coefficients) AMG and preonly is not a robust choice. To
control the approximation (the spectral equiv bounds), I
typically run a stationary or Krylov method on split 0 (e.g.
-fieldsplit_0_ksp_type xxx -fieldsplit_0_kps_rtol yyy). Since
the AMG preconditioner generated is spectrally equivalent
(usually!), these solves will converge to a chosen rtol in a
constant number of iterates under h-refinement. In practice,
if I don't enforce that I hit something like rtol=1.0e-1 (or
1.0e-2) on the 0th split, saddle point iterates will typically
increase for "hard" problems under mesh refinement (1e4-1e7
coefficient variation), and may not even converge at all when
just using -fieldsplit_0_ksp_type preonly. Failure ultimately
depends on how "strong" the preconditioner for A_00 block is
(consider re-discretized geometric multigrid versus AMG).
Running an iterative solve on the 0th split lets you control
and recover from weak/poor, but spectrally equivalent
preconditioners for A_00. Note that people hate this approach
as it invariably nests Krylov methods, and subsequently adds
more global reductions. However, it is scalable, optimal,
tuneable and converges faster than the case which didn't
converge at all :D</div>
<div><br>
</div>
<div>3/ I agree with Matt's comments, but I'd do a couple of
other things first.</div>
<div><br>
</div>
<div>* I'd first check the discretization is implemented
correctly. Your P2/P1 element is inf-sup stable - thus the
condition number of S (unpreconditioned) should be independent
of the mesh resolution (h). An easy way to verify this is to
run either LDU (schur_fact_type full) or DU (schur_fact_type
upper) and monitor the iterations required for those S solves.
Use -fieldsplit_1_pc_type none -fieldsplit_1_ksp_rtol 1.0e-8
-fieldsplit_1_ksp_monitor_true_residual
-fieldsplit_1_ksp_pc_right -fieldsplit_1_ksp_type gmres
-fieldsplit_0_pc_type lu</div>
<div><br>
</div>
<div>Then refine the mesh (ideally via sub-division) and repeat
the experiment.</div>
<div>If the S iterates don't asymptote, but instead grow with
each refinement - you likely have a problem with the
discretisation.</div>
<div><br>
</div>
<div>* Do the same experiment, but this time use your mass
matrix as the preconditioner for S and use
-fieldsplit_1_pc_type lu. If the iterates, compared with the
previous experiments (without a Schur PC) have gone up your
mass matrix is not defined correctly. If in the previous
experiment (without a Schur PC) iterates on the S solves were
bounded, but now when preconditioned with the mass matrix the
iterates go up, then your mass matrix is definitely not
correct.</div>
<div><br>
</div>
<div>4/ Lastly, to finally get to your question regarding does
+400 iterates for the solving the Schur seem "reasonable" and
what is "normal behaviour"? </div>
<div><br>
</div>
<div>It seems "high" to me. However the specifics of your
discretisation, mesh topology, element quality, boundary
conditions render it almost impossible to say what should be
expected. When I use a Q2-P2* discretisation on a structured
mesh with a non-constant viscosity I'd expect something like
50-60 for 1.0e-10 with a mass matrix scaled by the inverse
(local) viscosity. For constant viscosity maybe 30 iterates. I
think this kind of statement is not particularly useful or
helpful though.</div>
<div><br>
</div>
<div>
<div>Given you use an unstructured tet mesh, it is possible
that some elements have very bad quality (high aspect ratio
(AR), highly skewed). I am certain that P2/P1 has an inf-sup
constant which is sensitive to the element aspect ratio (I
don't recall the exact scaling wrt AR). From experience I
know that using the mass matrix as a preconditioner for
Schur is not robust as AR increases (e.g. iterations for the
S solve grow). Hence, with a couple of "bad" element in your
mesh, I could imagine that you could end up having to
perform +400 iterations </div>
</div>
<div><br>
</div>
<div>5/ Lastly, definitely don't impose one Dirichlet BC on
pressure to make the pressure unique. This really screws up
all the nice properties of your matrices. Just enforce the
constant null space for p. And as you noticed, GMRES magically
just does it automatically if the RHS of your original system
was consistent.</div>
<div> </div>
<div>Thanks,</div>
<div> Dave</div>
<div><br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 12 June 2017 at 20:20, David Nolte
<span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF"> Ok. With <tt>"-pc_fieldsplit_schur_fact_type
full" </tt>the outer iteration converges in 1 step.
The problem remain the Schur iterations.<br>
<br>
I was not sure if the problem was maybe the singular
pressure or the pressure Dirichlet BC. I tested the
solver with a standard Stokes flow in a pipe with a
constriction (zero Neumann BC for the pressure at the
outlet) and in a 3D cavity (enclosed flow, no pressure
BC or fixed at one point). I am not sure if I need to
attach the constant pressure nullspace to the matrix for
GMRES. Not doing so does not alter the convergence of
GMRES in the Schur solver (nor the pressure solution),
using a pressure Dirichlet BC however slows down
convergence (I suppose because of the scaling of the
matrix).<br>
<br>
I also checked the pressure mass matrix that I give
PETSc, it looks correct.<br>
<br>
In all these cases, the solver behaves just as before.
With LU in fieldsplit_0 and GMRES/LU with rtol 1e-10 in
fieldsplit_1, it converges after 1 outer iteration, but
the inner Schur solver converges slowly. <br>
<br>
How should the convergence of GMRES/LU of the Schur
complement *normally* behave?<br>
<br>
Thanks again!<span class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114HOEnZb"><font color="#888888"><br>
David</font></span>
<div>
<div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114h5"><br>
<br>
<br>
<br>
<div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755moz-cite-prefix">On
06/12/2017 12:41 PM, Matthew Knepley wrote:<br>
</div>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">On Mon, Jun 12, 2017
at 10:36 AM, David Nolte <span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF"> <br>
<div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171moz-cite-prefix">On
06/12/2017 07:50 AM, Matthew Knepley
wrote:<br>
</div>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">On Sun,
Jun 11, 2017 at 11:06 PM, David
Nolte <span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> Thanks
Matt, makes sense to me!<br>
<br>
I skipped direct solvers at
first because for these
'real' configurations LU
(mumps/superlu_dist) usally
goes out of memory (got 32GB
RAM). It would be reasonable
to take one more step back
and play with synthetic
examples.<br>
I managed to run one case
though with 936k dofs using:
("user" =pressure mass
matrix)<br>
<br>
<tt><...><br>
-pc_fieldsplit_schur_fact_type upper</tt><tt><br>
</tt><tt>-pc_fieldsplit_schur_precondition
user</tt><tt><br>
</tt><tt>-fieldsplit_0_ksp_type
preonly </tt><tt><br>
</tt><tt>-fieldsplit_0_pc_type
lu</tt><tt><br>
</tt><tt>-fieldsplit_0_pc_factor_mat_solver_package
mumps</tt><tt><br>
</tt><tt><br>
</tt><tt>
-fieldsplit_1_ksp_type
gmres<br>
-fieldsplit_1_ksp_monitor_true_residuals<br>
-fieldsplit_1_ksp_rtol
1e-10<br>
</tt><tt>-fieldsplit_1_pc_type
lu</tt><tt><br>
</tt><tt>
-fieldsplit_1_pc_factor_mat_solver_package
mumps</tt><tt><br>
</tt><br>
It takes 2 outer iterations,
as expected. However the
fieldsplit_1 solve takes
very long.<br>
</div>
</blockquote>
<div><br>
</div>
<div>1) It should take 1 outer
iterate, not two. The problem
is that your Schur tolerance
is way too high. Use</div>
<div><br>
</div>
<div> -fieldsplit_1_ksp_rtol
1e-10</div>
<div><br>
</div>
<div>or something like that.
Then it will take 1 iterate.</div>
</div>
</div>
</div>
</blockquote>
<br>
Shouldn't it take 2 with a triangular
Schur factorization and exact
preconditioners, and 1 with a full
factorization? (cf. Benzi et al 2005,
p.66, <a class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171moz-txt-link-freetext" href="http://www.mathcs.emory.edu/%7Ebenzi/Web_papers/bgl05.pdf" target="_blank">http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf</a>)<br>
<br>
That's exactly what I set: <tt>
-fieldsplit_1_ksp_rtol 1e-10 </tt>and
the Schur solver does drop below "rtol
< 1e-10"<br>
</div>
</blockquote>
<div><br>
</div>
<div>Oh, yes. Take away the upper until
things are worked out.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF">
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>2) There is a problem with
the Schur solve. Now from the
iterates</div>
<div><br>
</div>
<div><span style="font-family:monospace">423
KSP preconditioned resid
norm 2.638419658982e-02 true
resid norm
7.229653211635e-11
||r(i)||/||b||
7.229653211635e-11</span><br>
</div>
<div><br>
</div>
<div>it is clear that the
preconditioner is really
screwing stuff up. For
testing, you can use</div>
<div><br>
</div>
<div>
-pc_fieldsplit_schur_precondition
full</div>
<div><br>
</div>
<div>and your same setup here.
It should take one iterate. I
think there is something wrong
with your</div>
<div>mass matrix.</div>
</div>
</div>
</div>
</blockquote>
<br>
I agree. I forgot to mention that I am
considering an "enclosed flow" problem,
with u=0 on all the boundary and a
Dirichlet condition for the pressure in
one point for fixing the constant
pressure. Maybe the preconditioner is
not consistent with this setup, need to
check this..<br>
<br>
Thanks a lot<br>
<br>
<br>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> <br>
<tt> 0 KSP unpreconditioned
resid norm
4.038466809302e-03 true
resid norm
4.038466809302e-03
||r(i)||/||b||
1.000000000000e+00</tt><tt><br>
</tt><tt> Residual norms
for fieldsplit_1_ solve.</tt><tt><br>
</tt><tt> 0 KSP
preconditioned resid norm
0.000000000000e+00 true
resid norm
0.000000000000e+00
||r(i)||/||b||
-nan</tt><tt><br>
</tt><tt> Linear
fieldsplit_1_ solve
converged due to
CONVERGED_ATOL iterations
0</tt><tt><br>
</tt><tt> 1 KSP
unpreconditioned resid
norm 4.860095964831e-06
true resid norm
4.860095964831e-06
||r(i)||/||b||
1.203450763452e-03</tt><tt><br>
</tt><tt> Residual norms
for fieldsplit_1_ solve.</tt><tt><br>
</tt><tt> 0 KSP
preconditioned resid norm
2.965546249872e+08 true
resid norm
1.000000000000e+00
||r(i)||/||b||
1.000000000000e+00</tt><tt><br>
</tt><tt> 1 KSP
preconditioned resid norm
1.347596594634e+08 true
resid norm
3.599678801575e-01
||r(i)||/||b||
3.599678801575e-01</tt><tt><br>
</tt><tt> 2 KSP
preconditioned resid norm
5.913230136403e+07 true
resid norm
2.364916760834e-01
||r(i)||/||b||
2.364916760834e-01</tt><tt><br>
</tt><tt> 3 KSP
preconditioned resid norm
4.629700028930e+07 true
resid norm
1.984444715595e-01
||r(i)||/||b||
1.984444715595e-01</tt><tt><br>
</tt><tt> 4 KSP
preconditioned resid norm
3.804431276819e+07 true
resid norm
1.747224559120e-01
||r(i)||/||b||
1.747224559120e-01</tt><tt><br>
</tt><tt> 5 KSP
preconditioned resid norm
3.178769422140e+07 true
resid norm
1.402254864444e-01
||r(i)||/||b||
1.402254864444e-01</tt><tt><br>
</tt><tt> 6 KSP
preconditioned resid norm
2.648669043919e+07 true
resid norm
1.191164310866e-01
||r(i)||/||b||
1.191164310866e-01</tt><tt><br>
</tt><tt> 7 KSP
preconditioned resid norm
2.203522108614e+07 true
resid norm
9.690500018007e-02
||r(i)||/||b||
9.690500018007e-02</tt><tt><br>
<...><br>
422 KSP preconditioned
resid norm
2.984888715147e-02 true
resid norm
8.598401046494e-11
||r(i)||/||b||
8.598401046494e-11<br>
423 KSP preconditioned
resid norm
2.638419658982e-02 true
resid norm
7.229653211635e-11
||r(i)||/||b||
7.229653211635e-11<br>
Linear fieldsplit_1_
solve converged due to
CONVERGED_RTOL iterations
423<br>
2 KSP unpreconditioned
resid norm
3.539889585599e-16 true
resid norm
3.542279617063e-16
||r(i)||/||b||
8.771347603759e-14<br>
Linear solve converged due
to CONVERGED_RTOL
iterations 2<br>
</tt><tt><br>
</tt><br>
Does the slow convergence of
the Schur block mean that my
preconditioning matrix Sp is
a poor choice?<br>
<br>
Thanks,<br>
David<br>
<br>
<br>
<div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171gmail-m_5328507656823621836moz-cite-prefix">On
06/11/2017 08:53 AM,
Matthew Knepley wrote:<br>
</div>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">On
Sat, Jun 10, 2017 at
8:25 PM, David Nolte
<span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Dear
all,<br>
<br>
I am solving a
Stokes problem in
3D aorta
geometries, using
a P2/P1<br>
finite elements
discretization on
tetrahedral meshes
resulting in<br>
~1-1.5M DOFs.
Viscosity is
uniform (can be
adjusted
arbitrarily), and<br>
the right hand
side is a function
of noisy
measurement data.<br>
<br>
In other settings
of "standard"
Stokes flow
problems I have
obtained<br>
good convergence
with an "upper"
Schur complement
preconditioner,
using<br>
AMG (ML or Hypre)
on the velocity
block and
approximating the
Schur<br>
complement matrix
by the diagonal of
the pressure mass
matrix:<br>
<br>
-ksp_converged_reason<br>
-ksp_monitor_true_residual<br>
-ksp_initial_guess_nonzero<br>
-ksp_diagonal_scale<br>
-ksp_diagonal_scale_fix<br>
-ksp_type
fgmres<br>
-ksp_rtol
1.0e-8<br>
<br>
-pc_type
fieldsplit<br>
-pc_fieldsplit_type
schur<br>
-pc_fieldsplit_detect_saddle_point<br>
-pc_fieldsplit_schur_fact_type
upper<br>
-pc_fieldsplit_schur_precondition
user # <--
pressure mass
matrix<br>
<br>
-fieldsplit_0_ksp_type
preonly<br>
-fieldsplit_0_pc_type
ml<br>
<br>
-fieldsplit_1_ksp_type
preonly<br>
-fieldsplit_1_pc_type
jacobi<br>
</blockquote>
<div><br>
</div>
<div>1) I always
recommend starting
from an exact
solver and backing
off in small steps
for optimization.
Thus</div>
<div> I would
start with LU on
the upper block
and GMRES/LU with
toelrance 1e-10 on
the Schur block.</div>
<div> This should
converge in 1
iterate.</div>
<div><br>
</div>
<div>2) I don't
think you want
preonly on the
Schur system. You
might want
GMRES/Jacobi to
invert the mass
matrix.</div>
<div><br>
</div>
<div>3) You probably
want to tighten
the tolerance on
the Schur solve,
at least to start,
and then slowly
let it out. The</div>
<div> tight
tolerance will
show you how
effective the
preconditioner is
using that Schur
operator. Then you
can start</div>
<div> to evaluate
how effective the
Schur linear
sovler is.</div>
<div><br>
</div>
<div>Does this make
sense?</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
In my present case
this setup gives
rather slow
convergence
(varies for<br>
different
geometries between
200-500 or several
thousands!). I
obtain<br>
better convergence
with
"-pc_fieldsplit_schur_precondition
selfp"and<br>
using multigrid on
S, with
"-fieldsplit_1_pc_type
ml" (I don't think<br>
this is optimal,
though).<br>
<br>
I don't understand
why the pressure
mass matrix
approach performs
so<br>
poorly and wonder
what I could try
to improve the
convergence. Until
now<br>
I have been using
ML and Hypre
BoomerAMG mostly
with default
parameters.<br>
Surely they can be
improved by tuning
some parameters.
Which could be a<br>
good starting
point? Are there
other options I
should consider?<br>
<br>
With the above
setup (jacobi) for
a case that works
better than
others,<br>
the KSP terminates
with<br>
467 KSP
unpreconditioned
resid norm
2.072014323515e-09
true resid norm<br>
2.072014322600e-09
||r(i)||/||b||
9.939098100674e-09<br>
<br>
You can find the
output of
-ksp_view below.
Let me know if you
need more<br>
details.<br>
<br>
Thanks in advance
for your advice!<br>
Best wishes<br>
David<br>
<br>
<br>
KSP Object: 1 MPI
processes<br>
type: fgmres<br>
GMRES:
restart=30, using
Classical
(unmodified)
Gram-Schmidt<br>
Orthogonalization
with no iterative
refinement<br>
GMRES: happy
breakdown
tolerance 1e-30<br>
maximum
iterations=10000<br>
tolerances:
relative=1e-08,
absolute=1e-50,
divergence=10000.<br>
right
preconditioning<br>
diagonally
scaled system<br>
using nonzero
initial guess<br>
using
UNPRECONDITIONED
norm type for
convergence test<br>
PC Object: 1 MPI
processes<br>
type: fieldsplit<br>
FieldSplit
with Schur
preconditioner,
factorization
UPPER<br>
Preconditioner
for the Schur
complement formed
from user provided
matrix<br>
Split info:<br>
Split number 0
Defined by IS<br>
Split number 1
Defined by IS<br>
KSP solver for
A00 block<br>
KSP Object:
(fieldsplit_0_)
1 MPI processes<br>
type:
preonly<br>
maximum
iterations=10000,
initial guess is
zero<br>
tolerances:
relative=1e-05,
absolute=1e-50,
divergence=10000.<br>
left
preconditioning<br>
using NONE
norm type for
convergence test<br>
PC Object:
(fieldsplit_0_)
1 MPI processes<br>
type: ml<br>
MG: type
is MULTIPLICATIVE,
levels=5 cycles=v<br>
Cycles
per PCApply=1<br>
Using
Galerkin computed
coarse grid
matrices<br>
Coarse
grid solver --
level
-------------------------------<br>
KSP
Object:
(fieldsplit_0_mg_coarse_)
1 MPI<br>
processes<br>
type:
preonly<br>
maximum
iterations=10000,
initial guess is
zero<br>
tolerances:
relative=1e-05,
absolute=1e-50,
divergence=10000.<br>
left
preconditioning<br>
using
NONE norm type for
convergence test<br>
PC
Object:
(fieldsplit_0_mg_coarse_)
1 MPI<br>
processes<br>
type:
lu<br>
LU:
out-of-place
factorization<br>
tolerance for zero
pivot 2.22045e-14<br>
using diagonal
shift on blocks to
prevent zero pivot<br>
[INBLOCKS]<br>
matrix ordering:
nd<br>
factor fill ratio
given 5., needed
1.<br>
Factored matrix
follows:<br>
Mat Object:
1 MPI
processes<br>
type: seqaij<br>
rows=3, cols=3<br>
package used to
perform
factorization:
petsc<br>
total:
nonzeros=3,
allocated
nonzeros=3<br>
total number of
mallocs used
during
MatSetValues<br>
calls =0<br>
not using
I-node routines<br>
linear
system matrix =
precond matrix:<br>
Mat
Object:
1 MPI processes<br>
type: seqaij<br>
rows=3, cols=3<br>
total: nonzeros=3,
allocated
nonzeros=3<br>
total number of
mallocs used
during
MatSetValues calls
=0<br>
not using I-node
routines<br>
Down
solver
(pre-smoother) on
level 1<br>
-------------------------------<br>
KSP
Object:
(fieldsplit_0_mg_levels_1_)
1<br>
MPI processes<br>
type:
richardson<br>
Richardson:
damping factor=1.<br>
maximum
iterations=2<br>
tolerances:
relative=1e-05,
absolute=1e-50,
divergence=10000.<br>
left
preconditioning<br>
using
nonzero initial
guess<br>
using
NONE norm type for
convergence test<br>
PC
Object:
(fieldsplit_0_mg_levels_1_)
1<br>
MPI processes<br>
type:
sor<br>
SOR:
type =
local_symmetric,
iterations = 1,
local<br>
iterations = 1,
omega = 1.<br>
linear
system matrix =
precond matrix:<br>
Mat
Object:
1 MPI processes<br>
type: seqaij<br>
rows=15, cols=15<br>
total:
nonzeros=69,
allocated
nonzeros=69<br>
total number of
mallocs used
during
MatSetValues calls
=0<br>
not using I-node
routines<br>
Up solver
(post-smoother)
same as down
solver
(pre-smoother)<br>
Down
solver
(pre-smoother) on
level 2<br>
-------------------------------<br>
KSP
Object:
(fieldsplit_0_mg_levels_2_)
1<br>
MPI processes<br>
type:
richardson<br>
Richardson:
damping factor=1.<br>
maximum
iterations=2<br>
tolerances:
relative=1e-05,
absolute=1e-50,
divergence=10000.<br>
left
preconditioning<br>
using
nonzero initial
guess<br>
using
NONE norm type for
convergence test<br>
PC
Object:
(fieldsplit_0_mg_levels_2_)
1<br>
MPI processes<br>
type:
sor<br>
SOR:
type =
local_symmetric,
iterations = 1,
local<br>
iterations = 1,
omega = 1.<br>
linear
system matrix =
precond matrix:<br>
Mat
Object:
1 MPI processes<br>
type: seqaij<br>
rows=304, cols=304<br>
total:
nonzeros=7354,
allocated
nonzeros=7354<br>
total number of
mallocs used
during
MatSetValues calls
=0<br>
not using I-node
routines<br>
Up solver
(post-smoother)
same as down
solver
(pre-smoother)<br>
Down
solver
(pre-smoother) on
level 3<br>
-------------------------------<br>
KSP
Object:
(fieldsplit_0_mg_levels_3_)
1<br>
MPI processes<br>
type:
richardson<br>
Richardson:
damping factor=1.<br>
maximum
iterations=2<br>
tolerances:
relative=1e-05,
absolute=1e-50,
divergence=10000.<br>
left
preconditioning<br>
using
nonzero initial
guess<br>
using
NONE norm type for
convergence test<br>
PC
Object:
(fieldsplit_0_mg_levels_3_)
1<br>
MPI processes<br>
type:
sor<br>
SOR:
type =
local_symmetric,
iterations = 1,
local<br>
iterations = 1,
omega = 1.<br>
linear
system matrix =
precond matrix:<br>
Mat
Object:
1 MPI processes<br>
type: seqaij<br>
rows=30236,
cols=30236<br>
total:
nonzeros=2730644,
allocated
nonzeros=2730644<br>
total number of
mallocs used
during
MatSetValues calls
=0<br>
not using I-node
routines<br>
Up solver
(post-smoother)
same as down
solver
(pre-smoother)<br>
Down
solver
(pre-smoother) on
level 4<br>
-------------------------------<br>
KSP
Object:
(fieldsplit_0_mg_levels_4_)
1<br>
MPI processes<br>
type:
richardson<br>
Richardson:
damping factor=1.<br>
maximum
iterations=2<br>
tolerances:
relative=1e-05,
absolute=1e-50,
divergence=10000.<br>
left
preconditioning<br>
using
nonzero initial
guess<br>
using
NONE norm type for
convergence test<br>
PC
Object:
(fieldsplit_0_mg_levels_4_)
1<br>
MPI processes<br>
type:
sor<br>
SOR:
type =
local_symmetric,
iterations = 1,
local<br>
iterations = 1,
omega = 1.<br>
linear
system matrix =
precond matrix:<br>
Mat
Object:
(fieldsplit_0_)
1 MPI<br>
processes<br>
type: seqaij<br>
rows=894132,
cols=894132<br>
total:
nonzeros=70684164,
allocated
nonzeros=70684164<br>
total number of
mallocs used
during
MatSetValues calls
=0<br>
not using I-node
routines<br>
Up solver
(post-smoother)
same as down
solver
(pre-smoother)<br>
linear
system matrix =
precond matrix:<br>
Mat
Object:
(fieldsplit_0_)
1 MPI
processes<br>
type:
seqaij<br>
rows=894132,
cols=894132<br>
total:
nonzeros=70684164,
allocated
nonzeros=70684164<br>
total
number of mallocs
used during
MatSetValues calls
=0<br>
not
using I-node
routines<br>
KSP solver for
S = A11 - A10
inv(A00) A01<br>
KSP Object:
(fieldsplit_1_)
1 MPI processes<br>
type:
preonly<br>
maximum
iterations=10000,
initial guess is
zero<br>
tolerances:
relative=1e-05,
absolute=1e-50,
divergence=10000.<br>
left
preconditioning<br>
using NONE
norm type for
convergence test<br>
PC Object:
(fieldsplit_1_)
1 MPI processes<br>
type:
jacobi<br>
linear
system matrix
followed by
preconditioner
matrix:<br>
Mat
Object:
(fieldsplit_1_)
1 MPI
processes<br>
type:
schurcomplement<br>
rows=42025,
cols=42025<br>
Schur
complement A11 -
A10 inv(A00) A01<br>
A11<br>
Mat
Object:
(fieldsplit_1_)
1<br>
MPI processes<br>
type: seqaij<br>
rows=42025,
cols=42025<br>
total:
nonzeros=554063,
allocated
nonzeros=554063<br>
total number of
mallocs used
during
MatSetValues calls
=0<br>
not using I-node
routines<br>
A10<br>
Mat
Object:
1 MPI processes<br>
type: seqaij<br>
rows=42025,
cols=894132<br>
total:
nonzeros=6850107,
allocated
nonzeros=6850107<br>
total number of
mallocs used
during
MatSetValues calls
=0<br>
not using I-node
routines<br>
KSP of
A00<br>
KSP
Object:
(fieldsplit_0_)
1<br>
MPI processes<br>
type: preonly<br>
maximum
iterations=10000,
initial guess is
zero<br>
tolerances:
relative=1e-05,
absolute=1e-50,<br>
divergence=10000.<br>
left
preconditioning<br>
using NONE norm
type for
convergence test<br>
PC
Object:
(fieldsplit_0_)
1<br>
MPI processes<br>
type: ml<br>
MG: type is
MULTIPLICATIVE,
levels=5 cycles=v<br>
Cycles per
PCApply=1<br>
Using Galerkin
computed coarse
grid matrices<br>
Coarse grid solver
-- level
-------------------------------<br>
KSP Object:<br>
(fieldsplit_0_mg_coarse_) 1 MPI processes<br>
type: preonly<br>
maximum
iterations=10000,
initial guess is
zero<br>
tolerances:
relative=1e-05,
absolute=1e-50,<br>
divergence=10000.<br>
left
preconditioning<br>
using NONE norm
type for
convergence test<br>
PC Object:<br>
(fieldsplit_0_mg_coarse_) 1 MPI processes<br>
type: lu<br>
LU:
out-of-place
factorization<br>
tolerance for
zero pivot
2.22045e-14<br>
using diagonal
shift on blocks to
prevent zero<br>
pivot [INBLOCKS]<br>
matrix
ordering: nd<br>
factor fill
ratio given 5.,
needed 1.<br>
Factored
matrix follows:<br>
Mat
Object:
1
MPI<br>
processes<br>
type:
seqaij<br>
rows=3,
cols=3<br>
package
used to perform
factorization:
petsc<br>
total:
nonzeros=3,
allocated
nonzeros=3<br>
total
number of mallocs
used during<br>
MatSetValues calls
=0<br>
not
using I-node
routines<br>
</blockquote></div></div></div></blockquote></div></blockquote></div></div></div></blockquote></div></blockquote></div></div></div></blockquote></div></div></div></blockquote></div></div></div></blockquote></div></blockquote></div></div>