<div dir="ltr"><div dir="ltr">On Thu, Sep 3, 2020 at 11:43 AM Olivier Jamond <<a href="mailto:olivier.jamond@cea.fr">olivier.jamond@cea.fr</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello,<br>

<br>

I am working on a finite-elements/finite-volumes code, whose distributed <br>

solver is based on petsc. For FE, it relies on Lagrange multipliers for <br>

the imposition of various boundary conditions or interactions (simple <br>

dirichlet, contact, ...). This results in saddle point problems:<br>

<br>

[K C^t][U]=[F]<br>

[C  0 ][L] [D]<br>

<br>

Most of the time, the relations related to the matrix C are applied to <br>

dofs on the boundary of the domain. Then the size of L is much smaller <br>

than the size of U, which becomes more and more true as  the mesh is <br>

refined.<br>

<br>

The code construct this matrix as a nested matrix (one of the reason is <br>

that for some interactions such as contact, whereas being quite small, <br>

the size of the matrix C change constantly, and having it 'blended' into <br>

a monolithic 'big' matrix would require to recompute its profile/ <br>

reallocate / ... each time), and use fieldsplit preconditioner of type <br>

PC_COMPOSITE_SCHUR. I would like to solve the system using iterative <br>

methods to access good extensibility on a large number of subdomains.<br>

<br>

Simple BC such as Dirichlet can be eliminated into K (and must be in <br>

order to make K invertible).<br>

<br>

My problem is the computational cost of these constraints treated with <br>

Lagrange multipliers, whereas their number becomes more and more <br>

neglectable as the mesh is refined. To give an idea, let's consider a <br>

simple elastic cube with dirichlet BCs which are all eliminated (to <br>

ensure invertibility of K) but one on a single dof.<br>

<br>

-ksp_type preonly<br>

-pc_type fieldsplit<br>

-pc_fieldsplit_type schur<br>

-pc_fieldsplit_schur_factorization_type full<br>

-pc_fieldsplit_schur_precondition selfp<br>

<br>

-fieldsplit_u_ksp_type cg<br>

-fieldsplit_u_pc_type bjacobi<br>

<br>

-fieldsplit_l_ksp_type cg<br>

-fieldsplit_l_pc_type bjacobi<br>

<br>

it seems that my computation time is multiplied by a factor 3: 3 ksp <br>

solves of the big block 'K' are needed to apply the schur preconditioner <br>

(assuming that the ksp(S,Sp) converges in 1 iteration). It seems <br>

expensive for a single dof dirichlet!<br></blockquote><div><br></div><div>I am not sure you can get around this cost. In this case, it reduces to the well-known</div><div>Sherman-Morrison formula (<a href="https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula">https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula</a>),</div><div>which Woodbury generalized. It seems to have the same number of solves.</div><div><br></div><div>  Thanks,</div><div><br></div><div>     Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

And for some relations treated by Lagrange multipliers which involve <br>

many dofs, the number of ksp solve of the big block 'K' is ( 2 + number <br>

of iteration of ksp(S,Sp)). To reduce this, one can think about solving <br>

the ksp(S,Sp) with a direct solver, but then one must use <br>

"-pc_fieldsplit_schur_precondition self" which is advised against in the <br>

documentation...<br>

<br>

To illustrate this, on a small elasticity case: 32x32x32 cube on 8 <br>

processors, dirichlet on the top and bottom faces:<br>

* if all the dirichlet are eliminated (no C matrix, monolithic solve of <br>

the K bloc)<br>

   - computation time for the solve: ~400ms<br>

* if only the dirichlet of the bottom face are eliminated<br>

   - computation time for the solve: ~35000ms<br>

   - number of iteration of ksp(S,Sp): 37<br>

   - total number of iterations of ksp(K): 4939<br>

* only the dirichlet of the bottom face are eliminated with these options:<br>

-ksp_type fgmres<br>

-pc_type fieldsplit<br>

-pc_fieldsplit_type schur<br>

-pc_fieldsplit_schur_factorization_type full<br>

-pc_fieldsplit_schur_precondition selfp<br>

<br>

-fieldsplit_u_ksp_type cg<br>

-fieldsplit_u_pc_type bjacobi<br>

<br>

-fieldsplit_l_ksp_type cg<br>

-fieldsplit_l_pc_type bjacobi<br>

-fieldsplit_l_ksp_rtol 1e-10<br>

-fieldsplit_l_inner_ksp_type preonly<br>

-fieldsplit_l_inner_pc_type jacobi<br>

-fieldsplit_l_upper_ksp_type preonly<br>

-fieldsplit_l_upper_pc_type jacobi<br>

<br>

   - computation time for the solve: ~50000ms<br>

   - total number of iterations of ksp(K): 7424<br>

   - 'main' ksp number of iterations: 7424<br>

<br>

Then in the end, my question is: is there a smarter way to handle such <br>

'small' constraint matrices C, with the (maybe wrong) idea that a small <br>

number of extra dofs (the lagrange multipliers) should result in a small <br>

extra computation time ?<br>

<br>

Thanks!<br>

<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>