[Cc'ing petsc-dev]<br><br><div class="gmail_quote">On Thu, Oct 6, 2011 at 05:14, Jakub Sistek <span dir="ltr"><<a href="mailto:sistek@math.cas.cz" target="_blank">sistek@math.cas.cz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<u></u>

  <div text="#000000" bgcolor="#ffffff">

    Dear Jed,<br>

    <br>

    thank you for you explanations. In the meantime, I have checked out

    the petsc-dev to look a bit at the components you mention. The

    support of distributed meshes in PETSc is one of the many things I

    do not know and I have never used.<br></div></blockquote><div><br></div><div>The unstructured mesh component of PETSc, "Sieve", is not widely used at present, so we should not rely on it. Most users have their own unstructured mesh component, so we want to boil that down into common algebraic interfaces instead of depending on any particular mesh.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#ffffff">

    <br>

    After reading your email, I think that your idea of writing

    something like PCBDDC is quite possible. But before I start

    individual comments, let me recall my idea of the core for the

    coarse space in BDDC, which we should discuss in detail.<br>

    As you know, the coarse space in BDDC is given by a small set of

    constraints on continuity of the approximate solution among

    subdomains. These encapsulate the subdomain matrices into the saddle

    point problem, which is (hopefully) regular and solving these

    problems substitutes the Neumann solves in standard DD methods. This

    block of constraints on subdomain i is usually called C_i in

    literature. Now the constraints could be divided into two groups: <br>

    (i) point-wise continuity - these translate to a single entry of 1

    in the corresponding row of C and are usually related to so called

    corners<br>

    (ii) generalized constrains - these are typically equivalence of

    some averages between subdomains. These averages might be

    arithmetic, or some weighted averages. These translate into full

    rows of C comprised of weights of the averages.<br>

    <br>

    It is sometimes beneficial to distinguish between the two in

    implementations. Group (ii) can be to some extent automated - faces,

    edges, and vertices of subdomains can be recognised automatically in

    the preconditioner by counting the presence of variables in

    subdomains.<br>

    Group (i) is somehow more open. Identifying corners with verices is

    usually not sufficient. Algorithms for better selection depend on

    the problem type and are somewhat problematic to automate.<br>

    If there are enough constraints of type (ii), group (i) is not

    necessary. On the other hand, some implementations rely on the fact,

    that group (i) is already sufficient to make the subdomain matrix

    regular and on the fact that these constraints can be implemented as

    Dirichlet boundary conditions to the subdomain problem. This then

    allows using standard (even inexact) direct solvers for

    factorization of subdomain matrix. Constraints (ii) my then be

    enforced via lagrange multipliers or some other way.<br>

    I have used this in the past when the aim was to use frontal

    out-of-core solver (element by element loading) for subdomain

    problems. <br>

    I have to admit that I haven't implemented the method with inexact

    solvers. In a more recent implementation, I have this implemented by

    forming the entire saddle point problem with C with both (i) and

    (ii) and using an indefinite direct solver (MUMPS) even for SPD

    problems. I think this is not possible for inexact local solves, but

    the previous approach should be. <br></div></blockquote><div><br></div><div>Enforcing the general constraints is an important issue. Klawonn & Widlund 2006 "Dual-primal FETI methods for linear elasticity" has a section describing ways to perform the change of basis. This is probably just different language, but they change of basis keeps the subdomain problems definite. Even when Lagrange multipliers are left in, they describe how the unknowns can be ordered so that subdomain direct solves can always be done without pivoting. I would usually prefer to remove the constraints because iterative solvers for saddle point problems are more technical.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#ffffff"><div>

    <br>

    <blockquote type="cite">

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">

          <div text="#000000" bgcolor="#ffffff"> One thing I

            particularly enjoy on PETSc is the quick interchangeability

            of preconditioners and Krylov methods within the KSP object.

            But I can see this possible through strictly algebraic

            nature of the approach, where only matrix object is passed.

            <br>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>The KSP and PC objects have two "slots", the Krylov

          operator A and the "preconditioning matrix" B. I take a very

          liberal view of what B is. I consider it to be a container

          into which any problem/state-dependent information needed by

          the preconditioner should be placed. Topological and geometric

          information needed by the preconditioner does not change

          between nonlinear iterations/time steps/etc, so it can be

          given to the PC directly (e.g.

          PCBDDCSetCoarseSpaceCandidates() or something like that,

          though this could also be attached to the Mat).</div>

      </div>

    </blockquote></div>

    This would make some sense to me with respect to group (i) of

    constraints, for example as index set of unknows to fix on the

    subdomain.<div><br>

    <blockquote type="cite">

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">

          <div text="#000000" bgcolor="#ffffff"> On the other hand, all

            of the FETI-DP and BDDC implementations I have heard of are

            related to FEM computations and make the mesh somewhat

            accessible to the solver. Although I do not like this, also

            my third generation of implementation of the BDDC method

            still needs some limited information on geometry. Not really

            for construction of the coarse basis functions (this is

            algebraic in BDDC), but rather indirectly for the selection

            of coarse degrees of freedom. I am not aware of any existing

            approach to selection of coarse DOFs at the moment, that

            would not require some information on geometry for robust

            selection on unstructured 3D meshes. I could imagine that

            the required information could be limited to positions of

            unknowns and some information of the problem which is solved

            (the nullspace size), the topology of the mesh is not really

            necessary.<br>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>We recently introduced MatSetNearNullSpace() which is also

          needed by smoothed aggregation algebraic multigrid. (We

          decided that this belonged on the Mat because there are

          problems for which the near null space could change depending

          on the nonlinear regime, thus needing updating within a

          nonlinear iteration. For multiphysics problems, it is fragile

          to depend on access to the PC used for a particular "block"

          (if it exists), so I prefer to put information that may

          eventually need to be composed with or interact with other

          "blocks" into the Mat.)</div>

      </div>

    </blockquote></div>

    May I know what is the difference between nullspace and

    nearnullspace? I am not sure if this could be useful here...</div></blockquote><div><br></div><div>NullSpace is specifically for singular operators (e.g. the global problem is floating). NearNullSpace is just describing the low-energy modes (which need to be suitably represented in the coarse space to have a scalable method).</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#ffffff"><div><br>

    <blockquote type="cite">

      <div class="gmail_quote">

        <div> </div>

        <blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">

          <div text="#000000" bgcolor="#ffffff"> For this difficulty, I

            do not see it simple to write something like PCBDDC

            preconditioner that would simply interchange with PCASM and

            others. The situation would be simpler for BDDC if the

            preconditioner could use also some kind of mesh description.<br>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>I agree that it may always be necessary to provide extra

          information in order to use PCBDDC. The goal would not be to

          have a solver that only needs a (partially) assembled sparse

          matrix, but rather to have a purely algebraic interface by

          which that information can be provided.</div>

        <div><br>

        </div>

        <div>Another way for the PC to access grid information is

          through PCSetDM(). From the perspective of the solver, the DM

          is just an interface for providing grid- and

          discretization-dependent algebraic ingredients to the solver.

          This enables users of DM to have preconditioners automatically

          set up.</div>

      </div>

    </blockquote></div>

    This could be one option. If one would let the user to provide set

    of indices to constraint (essentially corners), then the mesh would

    probably not be necessary.<div><br>

    <blockquote type="cite">

      <div class="gmail_quote">

        <div> </div>

        <blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">

          <div text="#000000" bgcolor="#ffffff"> <br>

            The other issue I can see to be a bit conflicting with the

            KSP approach of PETSc might be the fact, that BDDC

            implementations introduce some coupling between

            preconditioner and Krylov method, which is in fact run only

            for the Schur complement problem at the interface among

            subdomains. Multiplication by the system matrix in Krylov

            method is performed by Dirichlet solves on each subdomain,

            which corresponds to passing a special matrix-vector

            multiplying routine to the Krylov method - at least, this is

            the approach I follow in my last implementation of BDDC, in

            the BDDCML code, where essentially the preconditioner

            provides the A*x function to the Krylov method. <br>

            I have seen this circumvented in PCNN by resolving the

            vectors to the original size after each application of the

            preconditioner, but in my opinion, this approach then loses

            some of the efficiency of running Krylov method on the Schur

            complement problem instead of the original problem, which

            usually has a great effect on convergence by itself.<br>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>There are tradeoffs both ways because iterating in the full

          space can accommodate inexact subdomain solves. There are a

          bunch of algorithms that use the same ingredients and are

          essentially equivalent when direct solvers are used, but

          different when inexact solvers are used:</div>

        <div><br>

        </div>

        <div>BDDC: iterate in interface space, needs exact subdomain and

          coarse solves</div>

        <div>BDDC/primal: iterate in interface space plus coarse primal

          dofs, tolerant of inexact coarse level solve</div>

        <div>BDDC/full: iterate in full space, tolerant of inexact

          subdomain and coarse solves</div>

        <div>FETI-DP: iterate in space of Lagrange multipliers, much

          like BDDC above</div>

        <div>iFETI-DP: iterate in space of subdomains with duplicate

          interface dofs, coarse primal dofs, and Lagrange multipliers.

          tolerant of inexact subdomain and coarse level solves</div>

        <div>irFETI-DP: iterate in space of Lagrange multipliers and

          coarse dofs, tolerant of inexact coarse solves</div>

        <div><br>

        </div>

        <div>One advantage of iterating in the full space is that the

          method can naturally be used to precondition a somewhat

          different matrix (e.g. a higher order discretization of the

          same physics on the same mesh) which can be applied

          matrix-free. Any method that iterates in a reduced space

          simply contains another KSP for that purpose.</div>

      </div>

    </blockquote></div>

    I agree, that in the context of PETSc, one should probably go for

    the full space after each application of the preconditioner. <br></div></blockquote><div><br></div><div>The main disadvantage being that the Krylov space requires more storage.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div text="#000000" bgcolor="#ffffff"><div>

    <blockquote type="cite">

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">

          <div text="#000000" bgcolor="#ffffff"> <br>

            Regarding problem types, I have little experience with using

            BDDC beyond Poisson problems and elasticity. Recently, I

            have done some tests with Stokes problems and incompressible

            Navier-Stokes problems, using "brute force" rather than any

            delicacy you may have in mind. The initial experience with

            Stokes problem using Taylor-Hood elements is quite good,

            things get worse for Navier-Stokes where the convergence,

            with the current simple coarse problem, deteriorates quickly

            with increasing Reynolds number. However, all these things

            should be better tested and, as you probably know, are

            rather recent topic of research and no clear conclusions

            have been really achieved.<br>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>I'm curious about what sort of problems you tested with

          Stokes. In particular, I'm interested in problems containing

          thin structures with large jumps in coefficients (e.g. 10^8).

          (In this application, I'm only interested in the Stokes

          problem, Re=1e-20 for these problems.)</div>

        <div><br>

        </div>

      </div>

    </blockquote></div>

    Well, I have tested the solver only on various modifications of the

    3D lid driven cavity problem (cubic geometry) for Stokes. My set-up

    were Taylor-Hood elements (serendipity or full). The problems were

    divided into regular or irregular subdomains. I have tested the

    results up to some 512, resp. 1024 CPUs and ~28M dof. For all these,

    the method almost did not increased number of iterations when H/h

    was kept fixed and the overall solver also scaled quite well. This

    was a bit surprising for me and required some development - now I

    use arithmetic averages for each velocity component and pressure as

    constraints. Without pressure averages, it did not work so nicely.

    Also, as far as I know, these cases are not covered by the theory

    for Stokes, which usually assumes elements with discontinuous

    pressure.<br>

    I haven't tested any contrast in coefficients for Stokes, but I have

    done a few of such tests for elasticity and the experience says that

    BDDC (with exact subdomain solvers) behaves well with respect to

    coefficient jumps inside subdomains, or aligned to subdomain

    boundaries. There are problems with jumps going through faces of

    subdomains.<br></div></blockquote><div><br></div><div>How bad do you consider these problems to be? I've heard rather mixed reports from different groups.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div text="#000000" bgcolor="#ffffff">

    Some of the Stokes results were included in my recent talk

    (<a href="http://www.math.cas.cz/~sistek/talks/Sistek-2011-ENUMATH-talk.pdf" target="_blank">http://www.math.cas.cz/~sistek/talks/Sistek-2011-ENUMATH-talk.pdf</a>)

    if you were interested. But I was more focused there about timings

    of the solver when using more coarse levels. There, the largest

    problems are not mentioned simply because in those tables, I was

    always missing some case I did not compute. <br>

    <br>

    I am wondering how you would like to proceed with the BDDC

    implementation to PETSc, especially how users should specify the

    coarse problem.<br></div></blockquote><div><br></div><div>I'm inclined to start by having the user specify the coarse space with the matrix C as shown above. Once the basic method is working, we can add more automatic support for defining C.</div>

<div><br></div><div>I periodically entertain plans of making a partially assembled format so that the user could assemble directly into the hierarchical format that has done a change of basis to make all coarse primal dofs explicit.</div>

<div><br></div><div>One concern I have is how to apply the method recursively when rigid body modes are represented explicitly in coarse spaces. That is, the coarse levels start having a different number of dofs at different "nodes" (e.g. 6 dofs at a face, but only 3 at a node).</div>

</div>