[petsc-dev] [SPAM] Re: [petsc-users] user experience with PCNN

Jakub Sistek sistek at math.cas.cz
Sun Oct 9 15:50:47 CDT 2011


Hello,

On 10/07/2011 05:00 AM, Jed Brown wrote:
> [Cc'ing petsc-dev]
>
> On Thu, Oct 6, 2011 at 05:14, Jakub Sistek <sistek at math.cas.cz 
> <mailto:sistek at math.cas.cz>> wrote:
>
>     Dear Jed,
>
>     thank you for you explanations. In the meantime, I have checked
>     out the petsc-dev to look a bit at the components you mention. The
>     support of distributed meshes in PETSc is one of the many things I
>     do not know and I have never used.
>
>
> The unstructured mesh component of PETSc, "Sieve", is not widely used 
> at present, so we should not rely on it. Most users have their own 
> unstructured mesh component, so we want to boil that down into common 
> algebraic interfaces instead of depending on any particular mesh.
>
>
>     After reading your email, I think that your idea of writing
>     something like PCBDDC is quite possible. But before I start
>     individual comments, let me recall my idea of the core for the
>     coarse space in BDDC, which we should discuss in detail.
>     As you know, the coarse space in BDDC is given by a small set of
>     constraints on continuity of the approximate solution among
>     subdomains. These encapsulate the subdomain matrices into the
>     saddle point problem, which is (hopefully) regular and solving
>     these problems substitutes the Neumann solves in standard DD
>     methods. This block of constraints on subdomain i is usually
>     called C_i in literature. Now the constraints could be divided
>     into two groups:
>     (i) point-wise continuity - these translate to a single entry of 1
>     in the corresponding row of C and are usually related to so called
>     corners
>     (ii) generalized constrains - these are typically equivalence of
>     some averages between subdomains. These averages might be
>     arithmetic, or some weighted averages. These translate into full
>     rows of C comprised of weights of the averages.
>
>     It is sometimes beneficial to distinguish between the two in
>     implementations. Group (ii) can be to some extent automated -
>     faces, edges, and vertices of subdomains can be recognised
>     automatically in the preconditioner by counting the presence of
>     variables in subdomains.
>     Group (i) is somehow more open. Identifying corners with verices
>     is usually not sufficient. Algorithms for better selection depend
>     on the problem type and are somewhat problematic to automate.
>     If there are enough constraints of type (ii), group (i) is not
>     necessary. On the other hand, some implementations rely on the
>     fact, that group (i) is already sufficient to make the subdomain
>     matrix regular and on the fact that these constraints can be
>     implemented as Dirichlet boundary conditions to the subdomain
>     problem. This then allows using standard (even inexact) direct
>     solvers for factorization of subdomain matrix. Constraints (ii) my
>     then be enforced via lagrange multipliers or some other way.
>     I have used this in the past when the aim was to use frontal
>     out-of-core solver (element by element loading) for subdomain
>     problems.
>     I have to admit that I haven't implemented the method with inexact
>     solvers. In a more recent implementation, I have this implemented
>     by forming the entire saddle point problem with C with both (i)
>     and (ii) and using an indefinite direct solver (MUMPS) even for
>     SPD problems. I think this is not possible for inexact local
>     solves, but the previous approach should be.
>
>
> Enforcing the general constraints is an important issue. Klawonn & 
> Widlund 2006 "Dual-primal FETI methods for linear elasticity" has a 
> section describing ways to perform the change of basis. This is 
> probably just different language, but they change of basis keeps the 
> subdomain problems definite. Even when Lagrange multipliers are left 
> in, they describe how the unknowns can be ordered so that subdomain 
> direct solves can always be done without pivoting. I would usually 
> prefer to remove the constraints because iterative solvers for saddle 
> point problems are more technical.
Well, let me share my experience here. Mathematically, they are just a 
different language. But not from the point of view of implementation. In 
this respect, I like work of C. Dohrmann (2003, 2007), which is in my 
eye more implementation aware. I have already tried to implement three 
ways to solve these saddle point problems, changing of basis among them 
(A). The other two were splitting the constraints into (i) and (ii), 
enforcing group (i) by changing the system as Dirichlet boundary 
conditions and group (ii) by Lagrange multipliers (B), and solving the 
saddle point system as such by a direct method with suitable pivoting 
(C). Let me summarize (up to my understanding to the approaches) my 
current (and very personal) feeling of (A), (B) and (C):

(A)
(+) allows implementation of the preconditioner by solving system with a 
large (assembled in subdomains and at coarse dofs) distributed matrix
(-) changing of basis produces a lot of fill into the matrix, which 
becomes very expensive especially for faces among subdomains, making 
direct solves after the change a lot more expensive
(-) does not provide explicit coarse problem, which is the basis for 
multilevel method
(-) not simple to generalize for arbitrary weighted averages on 
edges/faces (this may be my limited understanding)

(B)
(+) averages on faces do not present a problem
(+) allows using variety of solvers for the block K (inexact, iterative, 
...)
(-) requires two solves of the problem with K in each application of 
preconditioner (one in constructing the RHS for Lagrange multipliers 
problem, another one for actual solve with corrected RHS)
(-) requires enough point constraints to regularize K for direct solvers

(C)
(+) averages on faces do not present a problem
(+) simplicity of implementation - no distinction between group (i) and 
(ii) needed
(-) quite restrictive for subdomain solvers, which need to be robust for 
saddle-point type of systems

Since inexact solvers seem to me now quite important, I would be 
inclined to try (B) next time I would implement this task. This is in my 
opinion pretty much the approach of Dohrmann (2007) in "An approximate 
BDDC preconditioner".
>
>
>>         One thing I particularly enjoy on PETSc is the quick
>>         interchangeability of preconditioners and Krylov methods
>>         within the KSP object. But I can see this possible through
>>         strictly algebraic nature of the approach, where only matrix
>>         object is passed.
>>
>>
>>     The KSP and PC objects have two "slots", the Krylov operator A
>>     and the "preconditioning matrix" B. I take a very liberal view of
>>     what B is. I consider it to be a container into which any
>>     problem/state-dependent information needed by the preconditioner
>>     should be placed. Topological and geometric information needed by
>>     the preconditioner does not change between nonlinear
>>     iterations/time steps/etc, so it can be given to the PC directly
>>     (e.g. PCBDDCSetCoarseSpaceCandidates() or something like that,
>>     though this could also be attached to the Mat).
>     This would make some sense to me with respect to group (i) of
>     constraints, for example as index set of unknows to fix on the
>     subdomain.
>
>>         On the other hand, all of the FETI-DP and BDDC
>>         implementations I have heard of are related to FEM
>>         computations and make the mesh somewhat accessible to the
>>         solver. Although I do not like this, also my third generation
>>         of implementation of the BDDC method still needs some limited
>>         information on geometry. Not really for construction of the
>>         coarse basis functions (this is algebraic in BDDC), but
>>         rather indirectly for the selection of coarse degrees of
>>         freedom. I am not aware of any existing approach to selection
>>         of coarse DOFs at the moment, that would not require some
>>         information on geometry for robust selection on unstructured
>>         3D meshes. I could imagine that the required information
>>         could be limited to positions of unknowns and some
>>         information of the problem which is solved (the nullspace
>>         size), the topology of the mesh is not really necessary.
>>
>>
>>     We recently introduced MatSetNearNullSpace() which is also needed
>>     by smoothed aggregation algebraic multigrid. (We decided that
>>     this belonged on the Mat because there are problems for which the
>>     near null space could change depending on the nonlinear regime,
>>     thus needing updating within a nonlinear iteration. For
>>     multiphysics problems, it is fragile to depend on access to the
>>     PC used for a particular "block" (if it exists), so I prefer to
>>     put information that may eventually need to be composed with or
>>     interact with other "blocks" into the Mat.)
>     May I know what is the difference between nullspace and
>     nearnullspace? I am not sure if this could be useful here...
>
>
> NullSpace is specifically for singular operators (e.g. the global 
> problem is floating). NearNullSpace is just describing the low-energy 
> modes (which need to be suitably represented in the coarse space to 
> have a scalable method).
Thank you for explanation. These would probably correspond to subdomain 
coarse basis functions, which are computed in BDDC by these saddle point 
problems rather than from an exact knowledge. The coarse space here 
contains all the nullspaces of subdomains and typically is much larger 
than exact nullspace of subdomain matrices (e.g. for elasticity in 3D 
with cubic subdomains considering arithmetic averages on faces and 
edges, each subdomain has 26*3 = 78 coarse basis functions compared to 
dimension 6 of exact nullspace.
>
>
>>         For this difficulty, I do not see it simple to write
>>         something like PCBDDC preconditioner that would simply
>>         interchange with PCASM and others. The situation would be
>>         simpler for BDDC if the preconditioner could use also some
>>         kind of mesh description.
>>
>>
>>     I agree that it may always be necessary to provide extra
>>     information in order to use PCBDDC. The goal would not be to have
>>     a solver that only needs a (partially) assembled sparse matrix,
>>     but rather to have a purely algebraic interface by which that
>>     information can be provided.
>>
>>     Another way for the PC to access grid information is through
>>     PCSetDM(). From the perspective of the solver, the DM is just an
>>     interface for providing grid- and discretization-dependent
>>     algebraic ingredients to the solver. This enables users of DM to
>>     have preconditioners automatically set up.
>     This could be one option. If one would let the user to provide set
>     of indices to constraint (essentially corners), then the mesh
>     would probably not be necessary.
>
>>
>>         The other issue I can see to be a bit conflicting with the
>>         KSP approach of PETSc might be the fact, that BDDC
>>         implementations introduce some coupling between
>>         preconditioner and Krylov method, which is in fact run only
>>         for the Schur complement problem at the interface among
>>         subdomains. Multiplication by the system matrix in Krylov
>>         method is performed by Dirichlet solves on each subdomain,
>>         which corresponds to passing a special matrix-vector
>>         multiplying routine to the Krylov method - at least, this is
>>         the approach I follow in my last implementation of BDDC, in
>>         the BDDCML code, where essentially the preconditioner
>>         provides the A*x function to the Krylov method.
>>         I have seen this circumvented in PCNN by resolving the
>>         vectors to the original size after each application of the
>>         preconditioner, but in my opinion, this approach then loses
>>         some of the efficiency of running Krylov method on the Schur
>>         complement problem instead of the original problem, which
>>         usually has a great effect on convergence by itself.
>>
>>
>>     There are tradeoffs both ways because iterating in the full space
>>     can accommodate inexact subdomain solves. There are a bunch of
>>     algorithms that use the same ingredients and are essentially
>>     equivalent when direct solvers are used, but different when
>>     inexact solvers are used:
>>
>>     BDDC: iterate in interface space, needs exact subdomain and
>>     coarse solves
>>     BDDC/primal: iterate in interface space plus coarse primal dofs,
>>     tolerant of inexact coarse level solve
>>     BDDC/full: iterate in full space, tolerant of inexact subdomain
>>     and coarse solves
>>     FETI-DP: iterate in space of Lagrange multipliers, much like BDDC
>>     above
>>     iFETI-DP: iterate in space of subdomains with duplicate interface
>>     dofs, coarse primal dofs, and Lagrange multipliers. tolerant of
>>     inexact subdomain and coarse level solves
>>     irFETI-DP: iterate in space of Lagrange multipliers and coarse
>>     dofs, tolerant of inexact coarse solves
>>
>>     One advantage of iterating in the full space is that the method
>>     can naturally be used to precondition a somewhat different matrix
>>     (e.g. a higher order discretization of the same physics on the
>>     same mesh) which can be applied matrix-free. Any method that
>>     iterates in a reduced space simply contains another KSP for that
>>     purpose.
>     I agree, that in the context of PETSc, one should probably go for
>     the full space after each application of the preconditioner.
>
>
> The main disadvantage being that the Krylov space requires more storage.
>
>>
>>         Regarding problem types, I have little experience with using
>>         BDDC beyond Poisson problems and elasticity. Recently, I have
>>         done some tests with Stokes problems and incompressible
>>         Navier-Stokes problems, using "brute force" rather than any
>>         delicacy you may have in mind. The initial experience with
>>         Stokes problem using Taylor-Hood elements is quite good,
>>         things get worse for Navier-Stokes where the convergence,
>>         with the current simple coarse problem, deteriorates quickly
>>         with increasing Reynolds number. However, all these things
>>         should be better tested and, as you probably know, are rather
>>         recent topic of research and no clear conclusions have been
>>         really achieved.
>>
>>
>>     I'm curious about what sort of problems you tested with Stokes.
>>     In particular, I'm interested in problems containing thin
>>     structures with large jumps in coefficients (e.g. 10^8). (In this
>>     application, I'm only interested in the Stokes problem, Re=1e-20
>>     for these problems.)
>>
>     Well, I have tested the solver only on various modifications of
>     the 3D lid driven cavity problem (cubic geometry) for Stokes. My
>     set-up were Taylor-Hood elements (serendipity or full). The
>     problems were divided into regular or irregular subdomains. I have
>     tested the results up to some 512, resp. 1024 CPUs and ~28M dof.
>     For all these, the method almost did not increased number of
>     iterations when H/h was kept fixed and the overall solver also
>     scaled quite well. This was a bit surprising for me and required
>     some development - now I use arithmetic averages for each velocity
>     component and pressure as constraints. Without pressure averages,
>     it did not work so nicely. Also, as far as I know, these cases are
>     not covered by the theory for Stokes, which usually assumes
>     elements with discontinuous pressure.
>     I haven't tested any contrast in coefficients for Stokes, but I
>     have done a few of such tests for elasticity and the experience
>     says that BDDC (with exact subdomain solvers) behaves well with
>     respect to coefficient jumps inside subdomains, or aligned to
>     subdomain boundaries. There are problems with jumps going through
>     faces of subdomains.
>
>
> How bad do you consider these problems to be? I've heard rather mixed 
> reports from different groups.
OK. As I have mentioned, I think that jumps inside domains are naturally 
taken care for by subdomain solves, so very well for exact solvers. 
Jumps aligned by interfaces (for example coefficients constant for 
subdomains) are handled well by stiffness scaled averaging operator. The 
remaining group present troubles. We have created a small test problem 
(100k dofs) in a recent paper 
(http://dx.doi.org/10.1016/j.matcom.2011.03.014) for elasticity 
computation, where much stiffer 'rods' embedded in softer material were 
punching through faces of subdomains. There, the contrast in 
coefficients was only 10^5 and the resulting estimated condition number 
even after preconditioning ~2x10^4 when using arithmetic averages on 
faces. I know it is not a complete test of the phenomena, but one can 
conclude that the standard type of coarse problem does not suffice here. 
I can remember that by playing with the difference in coefficients, we 
were essentially able to make the solver stagnate. Hopefully, I should 
look into it again in a month or so, so I can keep you updated of a bit 
more systematic tests. Papers by Pechstein and Scheichl (2009 a,b) 
contain systematic tests already.

>     Some of the Stokes results were included in my recent talk
>     (http://www.math.cas.cz/~sistek/talks/Sistek-2011-ENUMATH-talk.pdf
>     <http://www.math.cas.cz/%7Esistek/talks/Sistek-2011-ENUMATH-talk.pdf>)
>     if you were interested. But I was more focused there about timings
>     of the solver when using more coarse levels. There, the largest
>     problems are not mentioned simply because in those tables, I was
>     always missing some case I did not compute.
>
>     I am wondering how you would like to proceed with the BDDC
>     implementation to PETSc, especially how users should specify the
>     coarse problem.
>
>
> I'm inclined to start by having the user specify the coarse space with 
> the matrix C as shown above. Once the basic method is working, we can 
> add more automatic support for defining C.
This sounds as a good starting point. I think that it would be 
reasonable to start from the PCNN implementation, where most components 
are already implemented.
>
> I periodically entertain plans of making a partially assembled format 
> so that the user could assemble directly into the hierarchical format 
> that has done a change of basis to make all coarse primal dofs explicit.
>
> One concern I have is how to apply the method recursively when rigid 
> body modes are represented explicitly in coarse spaces. That is, the 
> coarse levels start having a different number of dofs at different 
> "nodes" (e.g. 6 dofs at a face, but only 3 at a node).

I haven't been as accurate in our implementation of multilevel BDDC. 
Although nullspaces of subdomains are contained, the coarse space is 
typically a lot larger in applications I have tested. Averages are 
generated over those "nodes" that have sufficient number of unknowns. We 
have mostly used arithmetic averages (where we have 3 dofs on each 
face/edge/corner), and a bit of adaptive averages, where it is unclear 
in advance, how many averages are going to be enforced at a face, the 
upper limit of their number being the size of the face. I am here more 
used to the simpler view of constraints on continuity rather than coarse 
space properties - the more constraints, the closer functions are at 
neighbouring subdomains, and the better approximation one gets.

Best regards,

Jakub

-- 
Jakub Sistek, Ph.D.
postdoctoral researcher
Institute of Mathematics of the AS CR
http://www.math.cas.cz/~sistek/
tel: (+420) 222 090 710

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111009/a47ebe10/attachment.html>


More information about the petsc-dev mailing list