[Cc'ing petsc-dev]<br><br><div class="gmail_quote">On Thu, Oct 6, 2011 at 05:14, Jakub Sistek <span dir="ltr"><<a href="mailto:sistek@math.cas.cz" target="_blank">sistek@math.cas.cz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<u></u>
<div text="#000000" bgcolor="#ffffff">
Dear Jed,<br>
<br>
thank you for you explanations. In the meantime, I have checked out
the petsc-dev to look a bit at the components you mention. The
support of distributed meshes in PETSc is one of the many things I
do not know and I have never used.<br></div></blockquote><div><br></div><div>The unstructured mesh component of PETSc, "Sieve", is not widely used at present, so we should not rely on it. Most users have their own unstructured mesh component, so we want to boil that down into common algebraic interfaces instead of depending on any particular mesh.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#ffffff">
<br>
After reading your email, I think that your idea of writing
something like PCBDDC is quite possible. But before I start
individual comments, let me recall my idea of the core for the
coarse space in BDDC, which we should discuss in detail.<br>
As you know, the coarse space in BDDC is given by a small set of
constraints on continuity of the approximate solution among
subdomains. These encapsulate the subdomain matrices into the saddle
point problem, which is (hopefully) regular and solving these
problems substitutes the Neumann solves in standard DD methods. This
block of constraints on subdomain i is usually called C_i in
literature. Now the constraints could be divided into two groups: <br>
(i) point-wise continuity - these translate to a single entry of 1
in the corresponding row of C and are usually related to so called
corners<br>
(ii) generalized constrains - these are typically equivalence of
some averages between subdomains. These averages might be
arithmetic, or some weighted averages. These translate into full
rows of C comprised of weights of the averages.<br>
<br>
It is sometimes beneficial to distinguish between the two in
implementations. Group (ii) can be to some extent automated - faces,
edges, and vertices of subdomains can be recognised automatically in
the preconditioner by counting the presence of variables in
subdomains.<br>
Group (i) is somehow more open. Identifying corners with verices is
usually not sufficient. Algorithms for better selection depend on
the problem type and are somewhat problematic to automate.<br>
If there are enough constraints of type (ii), group (i) is not
necessary. On the other hand, some implementations rely on the fact,
that group (i) is already sufficient to make the subdomain matrix
regular and on the fact that these constraints can be implemented as
Dirichlet boundary conditions to the subdomain problem. This then
allows using standard (even inexact) direct solvers for
factorization of subdomain matrix. Constraints (ii) my then be
enforced via lagrange multipliers or some other way.<br>
I have used this in the past when the aim was to use frontal
out-of-core solver (element by element loading) for subdomain
problems. <br>
I have to admit that I haven't implemented the method with inexact
solvers. In a more recent implementation, I have this implemented by
forming the entire saddle point problem with C with both (i) and
(ii) and using an indefinite direct solver (MUMPS) even for SPD
problems. I think this is not possible for inexact local solves, but
the previous approach should be. <br></div></blockquote><div><br></div><div>Enforcing the general constraints is an important issue. Klawonn & Widlund 2006 "Dual-primal FETI methods for linear elasticity" has a section describing ways to perform the change of basis. This is probably just different language, but they change of basis keeps the subdomain problems definite. Even when Lagrange multipliers are left in, they describe how the unknowns can be ordered so that subdomain direct solves can always be done without pivoting. I would usually prefer to remove the constraints because iterative solvers for saddle point problems are more technical.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#ffffff"><div>
<br>
<blockquote type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
<div text="#000000" bgcolor="#ffffff"> One thing I
particularly enjoy on PETSc is the quick interchangeability
of preconditioners and Krylov methods within the KSP object.
But I can see this possible through strictly algebraic
nature of the approach, where only matrix object is passed.
<br>
</div>
</blockquote>
<div><br>
</div>
<div>The KSP and PC objects have two "slots", the Krylov
operator A and the "preconditioning matrix" B. I take a very
liberal view of what B is. I consider it to be a container
into which any problem/state-dependent information needed by
the preconditioner should be placed. Topological and geometric
information needed by the preconditioner does not change
between nonlinear iterations/time steps/etc, so it can be
given to the PC directly (e.g.
PCBDDCSetCoarseSpaceCandidates() or something like that,
though this could also be attached to the Mat).</div>
</div>
</blockquote></div>
This would make some sense to me with respect to group (i) of
constraints, for example as index set of unknows to fix on the
subdomain.<div><br>
<blockquote type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
<div text="#000000" bgcolor="#ffffff"> On the other hand, all
of the FETI-DP and BDDC implementations I have heard of are
related to FEM computations and make the mesh somewhat
accessible to the solver. Although I do not like this, also
my third generation of implementation of the BDDC method
still needs some limited information on geometry. Not really
for construction of the coarse basis functions (this is
algebraic in BDDC), but rather indirectly for the selection
of coarse degrees of freedom. I am not aware of any existing
approach to selection of coarse DOFs at the moment, that
would not require some information on geometry for robust
selection on unstructured 3D meshes. I could imagine that
the required information could be limited to positions of
unknowns and some information of the problem which is solved
(the nullspace size), the topology of the mesh is not really
necessary.<br>
</div>
</blockquote>
<div><br>
</div>
<div>We recently introduced MatSetNearNullSpace() which is also
needed by smoothed aggregation algebraic multigrid. (We
decided that this belonged on the Mat because there are
problems for which the near null space could change depending
on the nonlinear regime, thus needing updating within a
nonlinear iteration. For multiphysics problems, it is fragile
to depend on access to the PC used for a particular "block"
(if it exists), so I prefer to put information that may
eventually need to be composed with or interact with other
"blocks" into the Mat.)</div>
</div>
</blockquote></div>
May I know what is the difference between nullspace and
nearnullspace? I am not sure if this could be useful here...</div></blockquote><div><br></div><div>NullSpace is specifically for singular operators (e.g. the global problem is floating). NearNullSpace is just describing the low-energy modes (which need to be suitably represented in the coarse space to have a scalable method).</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#ffffff"><div><br>
<blockquote type="cite">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
<div text="#000000" bgcolor="#ffffff"> For this difficulty, I
do not see it simple to write something like PCBDDC
preconditioner that would simply interchange with PCASM and
others. The situation would be simpler for BDDC if the
preconditioner could use also some kind of mesh description.<br>
</div>
</blockquote>
<div><br>
</div>
<div>I agree that it may always be necessary to provide extra
information in order to use PCBDDC. The goal would not be to
have a solver that only needs a (partially) assembled sparse
matrix, but rather to have a purely algebraic interface by
which that information can be provided.</div>
<div><br>
</div>
<div>Another way for the PC to access grid information is
through PCSetDM(). From the perspective of the solver, the DM
is just an interface for providing grid- and
discretization-dependent algebraic ingredients to the solver.
This enables users of DM to have preconditioners automatically
set up.</div>
</div>
</blockquote></div>
This could be one option. If one would let the user to provide set
of indices to constraint (essentially corners), then the mesh would
probably not be necessary.<div><br>
<blockquote type="cite">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
<div text="#000000" bgcolor="#ffffff"> <br>
The other issue I can see to be a bit conflicting with the
KSP approach of PETSc might be the fact, that BDDC
implementations introduce some coupling between
preconditioner and Krylov method, which is in fact run only
for the Schur complement problem at the interface among
subdomains. Multiplication by the system matrix in Krylov
method is performed by Dirichlet solves on each subdomain,
which corresponds to passing a special matrix-vector
multiplying routine to the Krylov method - at least, this is
the approach I follow in my last implementation of BDDC, in
the BDDCML code, where essentially the preconditioner
provides the A*x function to the Krylov method. <br>
I have seen this circumvented in PCNN by resolving the
vectors to the original size after each application of the
preconditioner, but in my opinion, this approach then loses
some of the efficiency of running Krylov method on the Schur
complement problem instead of the original problem, which
usually has a great effect on convergence by itself.<br>
</div>
</blockquote>
<div><br>
</div>
<div>There are tradeoffs both ways because iterating in the full
space can accommodate inexact subdomain solves. There are a
bunch of algorithms that use the same ingredients and are
essentially equivalent when direct solvers are used, but
different when inexact solvers are used:</div>
<div><br>
</div>
<div>BDDC: iterate in interface space, needs exact subdomain and
coarse solves</div>
<div>BDDC/primal: iterate in interface space plus coarse primal
dofs, tolerant of inexact coarse level solve</div>
<div>BDDC/full: iterate in full space, tolerant of inexact
subdomain and coarse solves</div>
<div>FETI-DP: iterate in space of Lagrange multipliers, much
like BDDC above</div>
<div>iFETI-DP: iterate in space of subdomains with duplicate
interface dofs, coarse primal dofs, and Lagrange multipliers.
tolerant of inexact subdomain and coarse level solves</div>
<div>irFETI-DP: iterate in space of Lagrange multipliers and
coarse dofs, tolerant of inexact coarse solves</div>
<div><br>
</div>
<div>One advantage of iterating in the full space is that the
method can naturally be used to precondition a somewhat
different matrix (e.g. a higher order discretization of the
same physics on the same mesh) which can be applied
matrix-free. Any method that iterates in a reduced space
simply contains another KSP for that purpose.</div>
</div>
</blockquote></div>
I agree, that in the context of PETSc, one should probably go for
the full space after each application of the preconditioner. <br></div></blockquote><div><br></div><div>The main disadvantage being that the Krylov space requires more storage.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#ffffff"><div>
<blockquote type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
<div text="#000000" bgcolor="#ffffff"> <br>
Regarding problem types, I have little experience with using
BDDC beyond Poisson problems and elasticity. Recently, I
have done some tests with Stokes problems and incompressible
Navier-Stokes problems, using "brute force" rather than any
delicacy you may have in mind. The initial experience with
Stokes problem using Taylor-Hood elements is quite good,
things get worse for Navier-Stokes where the convergence,
with the current simple coarse problem, deteriorates quickly
with increasing Reynolds number. However, all these things
should be better tested and, as you probably know, are
rather recent topic of research and no clear conclusions
have been really achieved.<br>
</div>
</blockquote>
<div><br>
</div>
<div>I'm curious about what sort of problems you tested with
Stokes. In particular, I'm interested in problems containing
thin structures with large jumps in coefficients (e.g. 10^8).
(In this application, I'm only interested in the Stokes
problem, Re=1e-20 for these problems.)</div>
<div><br>
</div>
</div>
</blockquote></div>
Well, I have tested the solver only on various modifications of the
3D lid driven cavity problem (cubic geometry) for Stokes. My set-up
were Taylor-Hood elements (serendipity or full). The problems were
divided into regular or irregular subdomains. I have tested the
results up to some 512, resp. 1024 CPUs and ~28M dof. For all these,
the method almost did not increased number of iterations when H/h
was kept fixed and the overall solver also scaled quite well. This
was a bit surprising for me and required some development - now I
use arithmetic averages for each velocity component and pressure as
constraints. Without pressure averages, it did not work so nicely.
Also, as far as I know, these cases are not covered by the theory
for Stokes, which usually assumes elements with discontinuous
pressure.<br>
I haven't tested any contrast in coefficients for Stokes, but I have
done a few of such tests for elasticity and the experience says that
BDDC (with exact subdomain solvers) behaves well with respect to
coefficient jumps inside subdomains, or aligned to subdomain
boundaries. There are problems with jumps going through faces of
subdomains.<br></div></blockquote><div><br></div><div>How bad do you consider these problems to be? I've heard rather mixed reports from different groups.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#ffffff">
Some of the Stokes results were included in my recent talk
(<a href="http://www.math.cas.cz/~sistek/talks/Sistek-2011-ENUMATH-talk.pdf" target="_blank">http://www.math.cas.cz/~sistek/talks/Sistek-2011-ENUMATH-talk.pdf</a>)
if you were interested. But I was more focused there about timings
of the solver when using more coarse levels. There, the largest
problems are not mentioned simply because in those tables, I was
always missing some case I did not compute. <br>
<br>
I am wondering how you would like to proceed with the BDDC
implementation to PETSc, especially how users should specify the
coarse problem.<br></div></blockquote><div><br></div><div>I'm inclined to start by having the user specify the coarse space with the matrix C as shown above. Once the basic method is working, we can add more automatic support for defining C.</div>
<div><br></div><div>I periodically entertain plans of making a partially assembled format so that the user could assemble directly into the hierarchical format that has done a change of basis to make all coarse primal dofs explicit.</div>
<div><br></div><div>One concern I have is how to apply the method recursively when rigid body modes are represented explicitly in coarse spaces. That is, the coarse levels start having a different number of dofs at different "nodes" (e.g. 6 dofs at a face, but only 3 at a node).</div>
</div>