<div><br><div class="gmail_quote"><div>On Wed, 14 Jun 2017 at 19:42, David Nolte <<a href="mailto:dnolte@dim.uchile.cl">dnolte@dim.uchile.cl</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    Dave, thanks a lot for your great answer and for sharing your

    experience. I have a much clearer picture now. :)<br>

    <br>

    The experiments 3/ give the desired results for examples of cavity

    flow. The (1/mu scaled) mass matrix seems OK.<br>

    <br>

    I followed your and Matt's recommendations, used a FULL Schur

    factorization, LU in the 0th split, and gradually relaxed the

    tolerance of GMRES/Jacobi in split 1 (observed the gradual increase

    in outer iterations). Then I replaced the split_0 LU with AMG

    (further increase of outer iterations and iterations on the Schur

    complement). <br>

    Doing so I converged to using hypre boomeramg (smooth_type Euclid,

    strong_threshold 0.75) and 3 iterations of GMRES/Jacobi on the Schur

    block, which gave the best time-to-solution in my particular setup

    and convergence to rtol=1e-8 within 60 outer iterations.<br>

    In my cases, using GMRES in the 0th split (with rtol 1e-1 or 1e-2)

    instead of "preonly" did not help convergence (on the contrary).<br>

    <br>

    I also repeated the experiments with

    "-pc_fieldsplit_schur_precondition selfp", with hypre(ilu) in split

    0 and hypre in split 1, just to check, and somewhat disappointingly

    ( ;-) ) the wall time is less than half than when using gmres/Jac

    and Sp = mass matrix.<br>

    I am aware that this says nothing about scaling and robustness with

    respect to h-refinement...</div></blockquote><div><br></div><div>- selfp defines the schur pc as A10 inv(diag(A00)) A01. This operator is not spectrally equivalent to S</div><div><br></div><div>- For split 0 did you use preonly-hypre(ilu)?</div><div><br></div><div>- For split 1 did you also use hypre(ilu) (you just wrote hypre)?</div><div><br></div><div>- What was the iteration count for the saddle point problem with hypre and selfp? Iterates will increase if you refine the mesh and a cross over will occur at some (unknown) resolution and the mass matrix variant will be faster.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>

    <br>

    Would you agree that these configurations "make sense"?</div></blockquote><div><br></div><div>If you want to weak scale, the configuration with the mass matrix makes the most sense.</div><div><br></div><div>If you are only interested in solving many problems on one mesh, then do what ever you can to make the solve time as fast as possible - including using preconditioners defined with non-spectrally equivalent operators :D</div><div><br></div><div>Thanks,</div><div>  Dave</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>

    Furthermore, maybe anyone has a hint where to start tuning

    multigrid? So far hypre worked better than ML, but I have not

    experimented much with the parameters.</div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>

    <br>

    <br>

    Thanks again for your help!<br>

    <br>

    Best wishes,<br>

    David</div><div bgcolor="#FFFFFF" text="#000000"><br>

    <br>

    <br>

    <br>

    <div class="m_-7814857284138451345moz-cite-prefix">On 06/12/2017 04:52 PM, Dave May wrote:<br>

    </div>

    <blockquote type="cite">

      <div>I've been following the discussion and have a

        couple of comments:

        <div><br>

        </div>

        <div>1/ For the preconditioners that you are using (Schur

          factorisation LDU, or upper block triangular DU), the

          convergence properties (e.g. 1 iterate for LDU and 2 iterates

          for DU) come from analysis involving exact inverses of A_00

          and S</div>

        <div><br>

        </div>

        <div>Once you switch from using exact inverses of A_00 and S,

          you have to rely on spectral equivalence of operators. That is

          fine, but the spectral equivalence does not tell you how many

          iterates LDU or DU will require to converge. What it does

          inform you about is that if you have a spectrally equivalent

          operator for A_00 and S (Schur complement), then under mesh

          refinement, your iteration count (whatever it was prior to

          refinement) will not increase.<br>

        </div>

        <div><br>

        </div>

        <div>2/ Looking at your first set of options, I see you have

          opted to use -fieldsplit_ksp_type preonly (for both split 0

          and 1). That is nice as it creates a linear operator thus you

          don't need something like FGMRES or GCR applied to the saddle

          point problem. </div>

        <div><br>

        </div>

        <div>Your choice for Schur is fine in the sense that the

          diagonal of M is spectrally equivalent to M, and M is

          spectrally equivalent to S. Whether it is "fine" in terms of

          the iteration count for Schur systems, we cannot say apriori

          (since the spectral equivalence doesn't give us direct info

          about the iterations we should expect). </div>

        <div><br>

        </div>

        <div>Your preconditioner for A_00 relies on AMG producing a

          spectrally equivalent operator with bounds which are tight

          enough to ensure convergence of the saddle point problem. I'll

          try explain this.</div>

        <div><br>

        </div>

        <div>In my experience, for many problems (unstructured FE with

          variable coefficients, structured FE meshes with variable

          coefficients) AMG and preonly is not a robust choice. To

          control the approximation (the spectral equiv bounds), I

          typically run a stationary or Krylov method on split 0 (e.g.

          -fieldsplit_0_ksp_type xxx -fieldsplit_0_kps_rtol yyy). Since

          the AMG preconditioner generated is spectrally equivalent

          (usually!), these solves will converge to a chosen rtol in a

          constant number of iterates under h-refinement. In practice,

          if I don't enforce that I hit something like rtol=1.0e-1 (or

          1.0e-2) on the 0th split, saddle point iterates will typically

          increase for "hard" problems under mesh refinement (1e4-1e7

          coefficient variation), and may not even converge at all when

          just using -fieldsplit_0_ksp_type preonly. Failure ultimately

          depends on how "strong" the preconditioner for A_00 block is

          (consider re-discretized geometric multigrid versus AMG).

          Running an iterative solve on the 0th split lets you control

          and recover from weak/poor, but spectrally equivalent

          preconditioners for A_00. Note that people hate this approach

          as it invariably nests Krylov methods, and subsequently adds

          more global reductions. However, it is scalable, optimal,

          tuneable and converges faster than the case which didn't

          converge at all :D</div>

        <div><br>

        </div>

        <div>3/ I agree with Matt's comments, but I'd do a couple of

          other things first.</div>

        <div><br>

        </div>

        <div>* I'd first check the discretization is implemented

          correctly. Your P2/P1 element is inf-sup stable - thus the

          condition number of S (unpreconditioned) should be independent

          of the mesh resolution (h). An easy way to verify this is to

          run either LDU (schur_fact_type full) or DU (schur_fact_type

          upper) and monitor the iterations required for those S solves.

          Use -fieldsplit_1_pc_type none -fieldsplit_1_ksp_rtol 1.0e-8

          -fieldsplit_1_ksp_monitor_true_residual

          -fieldsplit_1_ksp_pc_right -fieldsplit_1_ksp_type gmres

          -fieldsplit_0_pc_type lu</div>

        <div><br>

        </div>

        <div>Then refine the mesh (ideally via sub-division) and repeat

          the experiment.</div>

        <div>If the S iterates don't asymptote, but instead grow with

          each refinement - you likely have a problem with the

          discretisation.</div>

        <div><br>

        </div>

        <div>* Do the same experiment, but this time use your mass

          matrix as the preconditioner for S and use

          -fieldsplit_1_pc_type lu. If the iterates, compared with the

          previous experiments (without a Schur PC) have gone up your

          mass matrix is not defined correctly. If in the previous

          experiment (without a Schur PC) iterates on the S solves were

          bounded, but now when preconditioned with the mass matrix the

          iterates go up, then your mass matrix is definitely not

          correct.</div>

        <div><br>

        </div>

        <div>4/ Lastly, to finally get to your question regarding does

           +400 iterates for the solving the Schur seem "reasonable" and

          what is "normal behaviour"? </div>

        <div><br>

        </div>

        <div>It seems "high" to me. However the specifics of your

          discretisation, mesh topology, element quality, boundary

          conditions render it almost impossible to say what should be

          expected. When I use a Q2-P2* discretisation on a structured

          mesh with a non-constant viscosity I'd expect something like

          50-60 for 1.0e-10 with a mass matrix scaled by the inverse

          (local) viscosity. For constant viscosity maybe 30 iterates. I

          think this kind of statement is not particularly useful or

          helpful though.</div>

        <div><br>

        </div>

        <div>

          <div>Given you use an unstructured tet mesh, it is possible

            that some elements have very bad quality (high aspect ratio

            (AR), highly skewed). I am certain that P2/P1 has an inf-sup

            constant which is sensitive to the element aspect ratio (I

            don't recall the exact scaling wrt AR). From experience I

            know that using the mass matrix as a preconditioner for

            Schur is not robust as AR increases (e.g. iterations for the

            S solve grow). Hence, with a couple of "bad" element in your

            mesh, I could imagine that you could end up having to

            perform +400 iterations </div>

        </div>

        <div><br>

        </div>

        <div>5/ Lastly, definitely don't impose one Dirichlet BC on

          pressure to make the pressure unique. This really screws up

          all the nice properties of your matrices. Just enforce the

          constant null space for p. And as you noticed, GMRES magically

          just does it automatically if the RHS of your original system

          was consistent.</div>

        <div> </div>

        <div>Thanks,</div>

        <div>  Dave</div>

        <div><br>

        </div>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On 12 June 2017 at 20:20, David Nolte

            <span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

              <div bgcolor="#FFFFFF"> Ok. With <tt>"-pc_fieldsplit_schur_fact_type

                  full" </tt>the outer iteration converges in 1 step.

                The problem remain the Schur iterations.<br>

                <br>

                I was not sure if the problem was maybe the singular

                pressure or the pressure Dirichlet BC. I tested the

                solver with a standard Stokes flow in a pipe with a

                constriction (zero Neumann BC for the pressure at the

                outlet) and in a 3D cavity (enclosed flow, no pressure

                BC or fixed at one point). I am not sure if I need to

                attach the constant pressure nullspace to the matrix for

                GMRES. Not doing so does not alter the convergence of

                GMRES in the Schur solver (nor the pressure solution),

                using a pressure Dirichlet BC however slows down

                convergence (I suppose because of the scaling of the

                matrix).<br>

                <br>

                I also checked the pressure mass matrix that I give

                PETSc, it looks correct.<br>

                <br>

                In all these cases, the solver behaves just as before.

                With LU in fieldsplit_0 and GMRES/LU with rtol 1e-10 in

                fieldsplit_1, it converges after 1 outer iteration, but

                the inner Schur solver converges slowly. <br>

                <br>

                How should the convergence of GMRES/LU of the Schur

                complement *normally* behave?<br>

                <br>

                Thanks again!<span class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114HOEnZb"><font color="#888888"><br>

                    David</font></span>

                <div>

                  <div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114h5"><br>

                    <br>

                    <br>

                    <br>

                    <div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755moz-cite-prefix">On

                      06/12/2017 12:41 PM, Matthew Knepley wrote:<br>

                    </div>

                    <blockquote type="cite">

                      <div>

                        <div class="gmail_extra">

                          <div class="gmail_quote">On Mon, Jun 12, 2017

                            at 10:36 AM, David Nolte <span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>

                            wrote:<br>

                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

                              <div bgcolor="#FFFFFF"> <br>

                                <div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171moz-cite-prefix">On

                                  06/12/2017 07:50 AM, Matthew Knepley

                                  wrote:<br>

                                </div>

                                <blockquote type="cite">

                                  <div>

                                    <div class="gmail_extra">

                                      <div class="gmail_quote">On Sun,

                                        Jun 11, 2017 at 11:06 PM, David

                                        Nolte <span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>

                                        wrote:<br>

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

                                          <div bgcolor="#FFFFFF"> Thanks

                                            Matt, makes sense to me!<br>

                                            <br>

                                            I skipped direct solvers at

                                            first because for these

                                            'real' configurations LU

                                            (mumps/superlu_dist) usally

                                            goes out of memory (got 32GB

                                            RAM). It would be reasonable

                                            to take one more step back

                                            and play with synthetic

                                            examples.<br>

                                            I managed to run one case

                                            though with 936k dofs using:

                                            ("user" =pressure mass

                                            matrix)<br>

                                            <br>

                                            <tt><...><br>

-pc_fieldsplit_schur_fact_type upper</tt><tt><br>

                                            </tt><tt>-pc_fieldsplit_schur_precondition

                                              user</tt><tt><br>

                                            </tt><tt>-fieldsplit_0_ksp_type

                                              preonly   </tt><tt><br>

                                            </tt><tt>-fieldsplit_0_pc_type

                                              lu</tt><tt><br>

                                            </tt><tt>-fieldsplit_0_pc_factor_mat_solver_package

                                              mumps</tt><tt><br>

                                            </tt><tt><br>

                                            </tt><tt>

                                              -fieldsplit_1_ksp_type

                                              gmres<br>

-fieldsplit_1_ksp_monitor_true_residuals<br>

                                              -fieldsplit_1_ksp_rtol

                                              1e-10<br>

                                            </tt><tt>-fieldsplit_1_pc_type

                                              lu</tt><tt><br>

                                            </tt><tt>

                                              -fieldsplit_1_pc_factor_mat_solver_package

                                              mumps</tt><tt><br>

                                            </tt><br>

                                            It takes 2 outer iterations,

                                            as expected. However the

                                            fieldsplit_1 solve takes

                                            very long.<br>

                                          </div>

                                        </blockquote>

                                        <div><br>

                                        </div>

                                        <div>1) It should take 1 outer

                                          iterate, not two. The problem

                                          is that your Schur tolerance

                                          is way too high. Use</div>

                                        <div><br>

                                        </div>

                                        <div>  -fieldsplit_1_ksp_rtol

                                          1e-10</div>

                                        <div><br>

                                        </div>

                                        <div>or something like that.

                                          Then it will take 1 iterate.</div>

                                      </div>

                                    </div>

                                  </div>

                                </blockquote>

                                <br>

                                Shouldn't it take 2 with a triangular

                                Schur factorization and exact

                                preconditioners, and 1 with a full

                                factorization? (cf. Benzi et al 2005,

                                p.66, <a class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171moz-txt-link-freetext" href="http://www.mathcs.emory.edu/%7Ebenzi/Web_papers/bgl05.pdf" target="_blank">http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf</a>)<br>

                                <br>

                                That's exactly what I set: <tt>

                                  -fieldsplit_1_ksp_rtol 1e-10 </tt>and

                                the Schur solver does drop below "rtol

                                < 1e-10"<br>

                              </div>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Oh, yes. Take away the upper until

                              things are worked out.</div>

                            <div><br>

                            </div>

                            <div>  Thanks,</div>

                            <div><br>

                            </div>

                            <div>    Matt</div>

                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

                              <div bgcolor="#FFFFFF">

                                <blockquote type="cite">

                                  <div>

                                    <div class="gmail_extra">

                                      <div class="gmail_quote">

                                        <div><br>

                                        </div>

                                        <div>2) There is a problem with

                                          the Schur solve. Now from the

                                          iterates</div>

                                        <div><br>

                                        </div>

                                        <div><span style="font-family:monospace">423

                                            KSP preconditioned resid

                                            norm 2.638419658982e-02 true

                                            resid norm

                                            7.229653211635e-11

                                            ||r(i)||/||b||

                                            7.229653211635e-11</span><br>

                                        </div>

                                        <div><br>

                                        </div>

                                        <div>it is clear that the

                                          preconditioner is really

                                          screwing stuff up. For

                                          testing, you can use</div>

                                        <div><br>

                                        </div>

                                        <div> 

                                          -pc_fieldsplit_schur_precondition

                                          full</div>

                                        <div><br>

                                        </div>

                                        <div>and your same setup here.

                                          It should take one iterate. I

                                          think there is something wrong

                                          with your</div>

                                        <div>mass matrix.</div>

                                      </div>

                                    </div>

                                  </div>

                                </blockquote>

                                <br>

                                I agree. I forgot to mention that I am

                                considering an "enclosed flow" problem,

                                with u=0 on all the boundary and a

                                Dirichlet condition for the pressure in

                                one point for fixing the constant

                                pressure. Maybe the preconditioner is

                                not consistent with this setup, need to

                                check this..<br>

                                <br>

                                Thanks a lot<br>

                                <br>

                                <br>

                                <blockquote type="cite">

                                  <div>

                                    <div class="gmail_extra">

                                      <div class="gmail_quote">

                                        <div><br>

                                        </div>

                                        <div>  Thanks,</div>

                                        <div><br>

                                        </div>

                                        <div>    Matt</div>

                                        <div><br>

                                        </div>

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

                                          <div bgcolor="#FFFFFF"> <br>

                                            <tt>  0 KSP unpreconditioned

                                              resid norm

                                              4.038466809302e-03 true

                                              resid norm

                                              4.038466809302e-03

                                              ||r(i)||/||b||

                                              1.000000000000e+00</tt><tt><br>

                                            </tt><tt>    Residual norms

                                              for fieldsplit_1_ solve.</tt><tt><br>

                                            </tt><tt>    0 KSP

                                              preconditioned resid norm

                                              0.000000000000e+00 true

                                              resid norm

                                              0.000000000000e+00

                                              ||r(i)||/||b||          

                                              -nan</tt><tt><br>

                                            </tt><tt>  Linear

                                              fieldsplit_1_ solve

                                              converged due to

                                              CONVERGED_ATOL iterations

                                              0</tt><tt><br>

                                            </tt><tt>  1 KSP

                                              unpreconditioned resid

                                              norm 4.860095964831e-06

                                              true resid norm

                                              4.860095964831e-06

                                              ||r(i)||/||b||

                                              1.203450763452e-03</tt><tt><br>

                                            </tt><tt>    Residual norms

                                              for fieldsplit_1_ solve.</tt><tt><br>

                                            </tt><tt>    0 KSP

                                              preconditioned resid norm

                                              2.965546249872e+08 true

                                              resid norm

                                              1.000000000000e+00

                                              ||r(i)||/||b||

                                              1.000000000000e+00</tt><tt><br>

                                            </tt><tt>    1 KSP

                                              preconditioned resid norm

                                              1.347596594634e+08 true

                                              resid norm

                                              3.599678801575e-01

                                              ||r(i)||/||b||

                                              3.599678801575e-01</tt><tt><br>

                                            </tt><tt>    2 KSP

                                              preconditioned resid norm

                                              5.913230136403e+07 true

                                              resid norm

                                              2.364916760834e-01

                                              ||r(i)||/||b||

                                              2.364916760834e-01</tt><tt><br>

                                            </tt><tt>    3 KSP

                                              preconditioned resid norm

                                              4.629700028930e+07 true

                                              resid norm

                                              1.984444715595e-01

                                              ||r(i)||/||b||

                                              1.984444715595e-01</tt><tt><br>

                                            </tt><tt>    4 KSP

                                              preconditioned resid norm

                                              3.804431276819e+07 true

                                              resid norm

                                              1.747224559120e-01

                                              ||r(i)||/||b||

                                              1.747224559120e-01</tt><tt><br>

                                            </tt><tt>    5 KSP

                                              preconditioned resid norm

                                              3.178769422140e+07 true

                                              resid norm

                                              1.402254864444e-01

                                              ||r(i)||/||b||

                                              1.402254864444e-01</tt><tt><br>

                                            </tt><tt>    6 KSP

                                              preconditioned resid norm

                                              2.648669043919e+07 true

                                              resid norm

                                              1.191164310866e-01

                                              ||r(i)||/||b||

                                              1.191164310866e-01</tt><tt><br>

                                            </tt><tt>    7 KSP

                                              preconditioned resid norm

                                              2.203522108614e+07 true

                                              resid norm

                                              9.690500018007e-02

                                              ||r(i)||/||b||

                                              9.690500018007e-02</tt><tt><br>

                                                  <...><br>

                                                  422 KSP preconditioned

                                              resid norm

                                              2.984888715147e-02 true

                                              resid norm

                                              8.598401046494e-11

                                              ||r(i)||/||b||

                                              8.598401046494e-11<br>

                                                  423 KSP preconditioned

                                              resid norm

                                              2.638419658982e-02 true

                                              resid norm

                                              7.229653211635e-11

                                              ||r(i)||/||b||

                                              7.229653211635e-11<br>

                                                Linear fieldsplit_1_

                                              solve converged due to

                                              CONVERGED_RTOL iterations

                                              423<br>

                                                2 KSP unpreconditioned

                                              resid norm

                                              3.539889585599e-16 true

                                              resid norm

                                              3.542279617063e-16

                                              ||r(i)||/||b||

                                              8.771347603759e-14<br>

                                              Linear solve converged due

                                              to CONVERGED_RTOL

                                              iterations 2<br>

                                            </tt><tt><br>

                                                              </tt><br>

                                            Does the slow convergence of

                                            the Schur block mean that my

                                            preconditioning matrix Sp is

                                            a poor choice?<br>

                                            <br>

                                            Thanks,<br>

                                            David<br>

                                            <br>

                                            <br>

                                            <div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171gmail-m_5328507656823621836moz-cite-prefix">On

                                              06/11/2017 08:53 AM,

                                              Matthew Knepley wrote:<br>

                                            </div>

                                            <blockquote type="cite">

                                              <div>

                                                <div class="gmail_extra">

                                                  <div class="gmail_quote">On

                                                    Sat, Jun 10, 2017 at

                                                    8:25 PM, David Nolte

                                                    <span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>

                                                    wrote:<br>

                                                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Dear

                                                      all,<br>

                                                      <br>

                                                      I am solving a

                                                      Stokes problem in

                                                      3D aorta

                                                      geometries, using

                                                      a P2/P1<br>

                                                      finite elements

                                                      discretization on

                                                      tetrahedral meshes

                                                      resulting in<br>

                                                      ~1-1.5M DOFs.

                                                      Viscosity is

                                                      uniform (can be

                                                      adjusted

                                                      arbitrarily), and<br>

                                                      the right hand

                                                      side is a function

                                                      of noisy

                                                      measurement data.<br>

                                                      <br>

                                                      In other settings

                                                      of "standard"

                                                      Stokes flow

                                                      problems I have

                                                      obtained<br>

                                                      good convergence

                                                      with an "upper"

                                                      Schur complement

                                                      preconditioner,

                                                      using<br>

                                                      AMG (ML or Hypre)

                                                      on the velocity

                                                      block and

                                                      approximating the

                                                      Schur<br>

                                                      complement matrix

                                                      by the diagonal of

                                                      the pressure mass

                                                      matrix:<br>

                                                      <br>

                                                      -ksp_converged_reason<br>

                                                      -ksp_monitor_true_residual<br>

                                                      -ksp_initial_guess_nonzero<br>

                                                      -ksp_diagonal_scale<br>

                                                      -ksp_diagonal_scale_fix<br>

                                                          -ksp_type

                                                      fgmres<br>

                                                          -ksp_rtol

                                                      1.0e-8<br>

                                                      <br>

                                                          -pc_type

                                                      fieldsplit<br>

                                                      -pc_fieldsplit_type

                                                      schur<br>

                                                      -pc_fieldsplit_detect_saddle_point<br>

                                                      -pc_fieldsplit_schur_fact_type

                                                      upper<br>

                                                      -pc_fieldsplit_schur_precondition

                                                      user    # <--

                                                      pressure mass

                                                      matrix<br>

                                                      <br>

                                                      -fieldsplit_0_ksp_type

                                                      preonly<br>

                                                      -fieldsplit_0_pc_type

                                                      ml<br>

                                                      <br>

                                                      -fieldsplit_1_ksp_type

                                                      preonly<br>

                                                      -fieldsplit_1_pc_type

                                                      jacobi<br>

                                                    </blockquote>

                                                    <div><br>

                                                    </div>

                                                    <div>1) I always

                                                      recommend starting

                                                      from an exact

                                                      solver and backing

                                                      off in small steps

                                                      for optimization.

                                                      Thus</div>

                                                    <div>    I would

                                                      start with LU on

                                                      the upper block

                                                      and GMRES/LU with

                                                      toelrance 1e-10 on

                                                      the Schur block.</div>

                                                    <div>    This should

                                                      converge in 1

                                                      iterate.</div>

                                                    <div><br>

                                                    </div>

                                                    <div>2) I don't

                                                      think you want

                                                      preonly on the

                                                      Schur system. You

                                                      might want

                                                      GMRES/Jacobi to

                                                      invert the mass

                                                      matrix.</div>

                                                    <div><br>

                                                    </div>

                                                    <div>3) You probably

                                                      want to tighten

                                                      the tolerance on

                                                      the Schur solve,

                                                      at least to start,

                                                      and then slowly

                                                      let it out. The</div>

                                                    <div>    tight

                                                      tolerance will

                                                      show you how

                                                      effective the

                                                      preconditioner is

                                                      using that Schur

                                                      operator. Then you

                                                      can start</div>

                                                    <div>    to evaluate

                                                      how effective the

                                                      Schur linear

                                                      sovler is.</div>

                                                    <div><br>

                                                    </div>

                                                    <div>Does this make

                                                      sense?</div>

                                                    <div><br>

                                                    </div>

                                                    <div>  Thanks,</div>

                                                    <div><br>

                                                    </div>

                                                    <div>     Matt</div>

                                                    <div> </div>

                                                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

                                                      In my present case

                                                      this setup gives

                                                      rather slow

                                                      convergence

                                                      (varies for<br>

                                                      different

                                                      geometries between

                                                      200-500 or several

                                                      thousands!). I

                                                      obtain<br>

                                                      better convergence

                                                      with

                                                      "-pc_fieldsplit_schur_precondition

                                                      selfp"and<br>

                                                      using multigrid on

                                                      S, with

                                                      "-fieldsplit_1_pc_type

                                                      ml" (I don't think<br>

                                                      this is optimal,

                                                      though).<br>

                                                      <br>

                                                      I don't understand

                                                      why the pressure

                                                      mass matrix

                                                      approach performs

                                                      so<br>

                                                      poorly and wonder

                                                      what I could try

                                                      to improve the

                                                      convergence. Until

                                                      now<br>

                                                      I have been using

                                                      ML and Hypre

                                                      BoomerAMG mostly

                                                      with default

                                                      parameters.<br>

                                                      Surely they can be

                                                      improved by tuning

                                                      some parameters.

                                                      Which could be a<br>

                                                      good starting

                                                      point? Are there

                                                      other options I

                                                      should consider?<br>

                                                      <br>

                                                      With the above

                                                      setup (jacobi) for

                                                      a case that works

                                                      better than

                                                      others,<br>

                                                      the KSP terminates

                                                      with<br>

                                                      467 KSP

                                                      unpreconditioned

                                                      resid norm

                                                      2.072014323515e-09

                                                      true resid norm<br>

                                                      2.072014322600e-09

                                                      ||r(i)||/||b||

                                                      9.939098100674e-09<br>

                                                      <br>

                                                      You can find the

                                                      output of

                                                      -ksp_view below.

                                                      Let me know if you

                                                      need more<br>

                                                      details.<br>

                                                      <br>

                                                      Thanks in advance

                                                      for your advice!<br>

                                                      Best wishes<br>

                                                      David<br>

                                                      <br>

                                                      <br>

                                                      KSP Object: 1 MPI

                                                      processes<br>

                                                        type: fgmres<br>

                                                          GMRES:

                                                      restart=30, using

                                                      Classical

                                                      (unmodified)

                                                      Gram-Schmidt<br>

                                                      Orthogonalization

                                                      with no iterative

                                                      refinement<br>

                                                          GMRES: happy

                                                      breakdown

                                                      tolerance 1e-30<br>

                                                        maximum

                                                      iterations=10000<br>

                                                        tolerances: 

                                                      relative=1e-08,

                                                      absolute=1e-50,

                                                      divergence=10000.<br>

                                                        right

                                                      preconditioning<br>

                                                        diagonally

                                                      scaled system<br>

                                                        using nonzero

                                                      initial guess<br>

                                                        using

                                                      UNPRECONDITIONED

                                                      norm type for

                                                      convergence test<br>

                                                      PC Object: 1 MPI

                                                      processes<br>

                                                        type: fieldsplit<br>

                                                          FieldSplit

                                                      with Schur

                                                      preconditioner,

                                                      factorization

                                                      UPPER<br>

                                                          Preconditioner

                                                      for the Schur

                                                      complement formed

                                                      from user provided

                                                      matrix<br>

                                                          Split info:<br>

                                                          Split number 0

                                                      Defined by IS<br>

                                                          Split number 1

                                                      Defined by IS<br>

                                                          KSP solver for

                                                      A00 block<br>

                                                            KSP Object: 

                                                      (fieldsplit_0_)   

                                                         1 MPI processes<br>

                                                              type:

                                                      preonly<br>

                                                              maximum

                                                      iterations=10000,

                                                      initial guess is

                                                      zero<br>

                                                      tolerances: 

                                                      relative=1e-05,

                                                      absolute=1e-50,

                                                      divergence=10000.<br>

                                                              left

                                                      preconditioning<br>

                                                              using NONE

                                                      norm type for

                                                      convergence test<br>

                                                            PC Object: 

                                                      (fieldsplit_0_)   

                                                         1 MPI processes<br>

                                                              type: ml<br>

                                                                MG: type

                                                      is MULTIPLICATIVE,

                                                      levels=5 cycles=v<br>

                                                                  Cycles

                                                      per PCApply=1<br>

                                                                  Using

                                                      Galerkin computed

                                                      coarse grid

                                                      matrices<br>

                                                              Coarse

                                                      grid solver --

                                                      level

                                                      -------------------------------<br>

                                                                KSP

                                                      Object:         

                                                      (fieldsplit_0_mg_coarse_) 

                                                               1 MPI<br>

                                                      processes<br>

                                                                  type:

                                                      preonly<br>

                                                      maximum

                                                      iterations=10000,

                                                      initial guess is

                                                      zero<br>

                                                      tolerances: 

                                                      relative=1e-05,

                                                      absolute=1e-50,

                                                      divergence=10000.<br>

                                                                  left

                                                      preconditioning<br>

                                                                  using

                                                      NONE norm type for

                                                      convergence test<br>

                                                                PC

                                                      Object:         

                                                      (fieldsplit_0_mg_coarse_) 

                                                               1 MPI<br>

                                                      processes<br>

                                                                  type:

                                                      lu<br>

                                                                    LU:

                                                      out-of-place

                                                      factorization<br>

                                                      tolerance for zero

                                                      pivot 2.22045e-14<br>

                                                      using diagonal

                                                      shift on blocks to

                                                      prevent zero pivot<br>

                                                      [INBLOCKS]<br>

                                                      matrix ordering:

                                                      nd<br>

                                                      factor fill ratio

                                                      given 5., needed

                                                      1.<br>

                                                      Factored matrix

                                                      follows:<br>

                                                      Mat Object:       

                                                                 1 MPI

                                                      processes<br>

                                                        type: seqaij<br>

                                                        rows=3, cols=3<br>

                                                        package used to

                                                      perform

                                                      factorization:

                                                      petsc<br>

                                                        total:

                                                      nonzeros=3,

                                                      allocated

                                                      nonzeros=3<br>

                                                        total number of

                                                      mallocs used

                                                      during

                                                      MatSetValues<br>

                                                      calls =0<br>

                                                          not using

                                                      I-node routines<br>

                                                                  linear

                                                      system matrix =

                                                      precond matrix:<br>

                                                                  Mat

                                                      Object:           

                                                       1 MPI processes<br>

                                                      type: seqaij<br>

                                                      rows=3, cols=3<br>

                                                      total: nonzeros=3,

                                                      allocated

                                                      nonzeros=3<br>

                                                      total number of

                                                      mallocs used

                                                      during

                                                      MatSetValues calls

                                                      =0<br>

                                                      not using I-node

                                                      routines<br>

                                                              Down

                                                      solver

                                                      (pre-smoother) on

                                                      level 1<br>

-------------------------------<br>

                                                                KSP

                                                      Object:         

                                                      (fieldsplit_0_mg_levels_1_) 

                                                               1<br>

                                                      MPI processes<br>

                                                                  type:

                                                      richardson<br>

                                                      Richardson:

                                                      damping factor=1.<br>

                                                      maximum

                                                      iterations=2<br>

                                                      tolerances: 

                                                      relative=1e-05,

                                                      absolute=1e-50,

                                                      divergence=10000.<br>

                                                                  left

                                                      preconditioning<br>

                                                                  using

                                                      nonzero initial

                                                      guess<br>

                                                                  using

                                                      NONE norm type for

                                                      convergence test<br>

                                                                PC

                                                      Object:         

                                                      (fieldsplit_0_mg_levels_1_) 

                                                               1<br>

                                                      MPI processes<br>

                                                                  type:

                                                      sor<br>

                                                                    SOR:

                                                      type =

                                                      local_symmetric,

                                                      iterations = 1,

                                                      local<br>

                                                      iterations = 1,

                                                      omega = 1.<br>

                                                                  linear

                                                      system matrix =

                                                      precond matrix:<br>

                                                                  Mat

                                                      Object:           

                                                       1 MPI processes<br>

                                                      type: seqaij<br>

                                                      rows=15, cols=15<br>

                                                      total:

                                                      nonzeros=69,

                                                      allocated

                                                      nonzeros=69<br>

                                                      total number of

                                                      mallocs used

                                                      during

                                                      MatSetValues calls

                                                      =0<br>

                                                      not using I-node

                                                      routines<br>

                                                              Up solver

                                                      (post-smoother)

                                                      same as down

                                                      solver

                                                      (pre-smoother)<br>

                                                              Down

                                                      solver

                                                      (pre-smoother) on

                                                      level 2<br>

-------------------------------<br>

                                                                KSP

                                                      Object:         

                                                      (fieldsplit_0_mg_levels_2_) 

                                                               1<br>

                                                      MPI processes<br>

                                                                  type:

                                                      richardson<br>

                                                      Richardson:

                                                      damping factor=1.<br>

                                                      maximum

                                                      iterations=2<br>

                                                      tolerances: 

                                                      relative=1e-05,

                                                      absolute=1e-50,

                                                      divergence=10000.<br>

                                                                  left

                                                      preconditioning<br>

                                                                  using

                                                      nonzero initial

                                                      guess<br>

                                                                  using

                                                      NONE norm type for

                                                      convergence test<br>

                                                                PC

                                                      Object:         

                                                      (fieldsplit_0_mg_levels_2_) 

                                                               1<br>

                                                      MPI processes<br>

                                                                  type:

                                                      sor<br>

                                                                    SOR:

                                                      type =

                                                      local_symmetric,

                                                      iterations = 1,

                                                      local<br>

                                                      iterations = 1,

                                                      omega = 1.<br>

                                                                  linear

                                                      system matrix =

                                                      precond matrix:<br>

                                                                  Mat

                                                      Object:           

                                                       1 MPI processes<br>

                                                      type: seqaij<br>

                                                      rows=304, cols=304<br>

                                                      total:

                                                      nonzeros=7354,

                                                      allocated

                                                      nonzeros=7354<br>

                                                      total number of

                                                      mallocs used

                                                      during

                                                      MatSetValues calls

                                                      =0<br>

                                                      not using I-node

                                                      routines<br>

                                                              Up solver

                                                      (post-smoother)

                                                      same as down

                                                      solver

                                                      (pre-smoother)<br>

                                                              Down

                                                      solver

                                                      (pre-smoother) on

                                                      level 3<br>

-------------------------------<br>

                                                                KSP

                                                      Object:         

                                                      (fieldsplit_0_mg_levels_3_) 

                                                               1<br>

                                                      MPI processes<br>

                                                                  type:

                                                      richardson<br>

                                                      Richardson:

                                                      damping factor=1.<br>

                                                      maximum

                                                      iterations=2<br>

                                                      tolerances: 

                                                      relative=1e-05,

                                                      absolute=1e-50,

                                                      divergence=10000.<br>

                                                                  left

                                                      preconditioning<br>

                                                                  using

                                                      nonzero initial

                                                      guess<br>

                                                                  using

                                                      NONE norm type for

                                                      convergence test<br>

                                                                PC

                                                      Object:         

                                                      (fieldsplit_0_mg_levels_3_) 

                                                               1<br>

                                                      MPI processes<br>

                                                                  type:

                                                      sor<br>

                                                                    SOR:

                                                      type =

                                                      local_symmetric,

                                                      iterations = 1,

                                                      local<br>

                                                      iterations = 1,

                                                      omega = 1.<br>

                                                                  linear

                                                      system matrix =

                                                      precond matrix:<br>

                                                                  Mat

                                                      Object:           

                                                       1 MPI processes<br>

                                                      type: seqaij<br>

                                                      rows=30236,

                                                      cols=30236<br>

                                                      total:

                                                      nonzeros=2730644,

                                                      allocated

                                                      nonzeros=2730644<br>

                                                      total number of

                                                      mallocs used

                                                      during

                                                      MatSetValues calls

                                                      =0<br>

                                                      not using I-node

                                                      routines<br>

                                                              Up solver

                                                      (post-smoother)

                                                      same as down

                                                      solver

                                                      (pre-smoother)<br>

                                                              Down

                                                      solver

                                                      (pre-smoother) on

                                                      level 4<br>

-------------------------------<br>

                                                                KSP

                                                      Object:         

                                                      (fieldsplit_0_mg_levels_4_) 

                                                               1<br>

                                                      MPI processes<br>

                                                                  type:

                                                      richardson<br>

                                                      Richardson:

                                                      damping factor=1.<br>

                                                      maximum

                                                      iterations=2<br>

                                                      tolerances: 

                                                      relative=1e-05,

                                                      absolute=1e-50,

                                                      divergence=10000.<br>

                                                                  left

                                                      preconditioning<br>

                                                                  using

                                                      nonzero initial

                                                      guess<br>

                                                                  using

                                                      NONE norm type for

                                                      convergence test<br>

                                                                PC

                                                      Object:         

                                                      (fieldsplit_0_mg_levels_4_) 

                                                               1<br>

                                                      MPI processes<br>

                                                                  type:

                                                      sor<br>

                                                                    SOR:

                                                      type =

                                                      local_symmetric,

                                                      iterations = 1,

                                                      local<br>

                                                      iterations = 1,

                                                      omega = 1.<br>

                                                                  linear

                                                      system matrix =

                                                      precond matrix:<br>

                                                                  Mat

                                                      Object:           

                                                      (fieldsplit_0_)   

                                                               1 MPI<br>

                                                      processes<br>

                                                      type: seqaij<br>

                                                      rows=894132,

                                                      cols=894132<br>

                                                      total:

                                                      nonzeros=70684164,

                                                      allocated

                                                      nonzeros=70684164<br>

                                                      total number of

                                                      mallocs used

                                                      during

                                                      MatSetValues calls

                                                      =0<br>

                                                      not using I-node

                                                      routines<br>

                                                              Up solver

                                                      (post-smoother)

                                                      same as down

                                                      solver

                                                      (pre-smoother)<br>

                                                              linear

                                                      system matrix =

                                                      precond matrix:<br>

                                                              Mat

                                                      Object:       

                                                      (fieldsplit_0_)   

                                                           1 MPI

                                                      processes<br>

                                                                type:

                                                      seqaij<br>

                                                      rows=894132,

                                                      cols=894132<br>

                                                                total:

                                                      nonzeros=70684164,

                                                      allocated

                                                      nonzeros=70684164<br>

                                                                total

                                                      number of mallocs

                                                      used during

                                                      MatSetValues calls

                                                      =0<br>

                                                                  not

                                                      using I-node

                                                      routines<br>

                                                          KSP solver for

                                                      S = A11 - A10

                                                      inv(A00) A01<br>

                                                            KSP Object: 

                                                      (fieldsplit_1_)   

                                                         1 MPI processes<br>

                                                              type:

                                                      preonly<br>

                                                              maximum

                                                      iterations=10000,

                                                      initial guess is

                                                      zero<br>

                                                      tolerances: 

                                                      relative=1e-05,

                                                      absolute=1e-50,

                                                      divergence=10000.<br>

                                                              left

                                                      preconditioning<br>

                                                              using NONE

                                                      norm type for

                                                      convergence test<br>

                                                            PC Object: 

                                                      (fieldsplit_1_)   

                                                         1 MPI processes<br>

                                                              type:

                                                      jacobi<br>

                                                              linear

                                                      system matrix

                                                      followed by

                                                      preconditioner

                                                      matrix:<br>

                                                              Mat

                                                      Object:       

                                                      (fieldsplit_1_)   

                                                           1 MPI

                                                      processes<br>

                                                                type:

                                                      schurcomplement<br>

                                                      rows=42025,

                                                      cols=42025<br>

                                                                  Schur

                                                      complement A11 -

                                                      A10 inv(A00) A01<br>

                                                                  A11<br>

                                                                    Mat

                                                      Object:           

                                                        (fieldsplit_1_) 

                                                                   1<br>

                                                      MPI processes<br>

                                                      type: seqaij<br>

                                                      rows=42025,

                                                      cols=42025<br>

                                                      total:

                                                      nonzeros=554063,

                                                      allocated

                                                      nonzeros=554063<br>

                                                      total number of

                                                      mallocs used

                                                      during

                                                      MatSetValues calls

                                                      =0<br>

                                                      not using I-node

                                                      routines<br>

                                                                  A10<br>

                                                                    Mat

                                                      Object:           

                                                         1 MPI processes<br>

                                                      type: seqaij<br>

                                                      rows=42025,

                                                      cols=894132<br>

                                                      total:

                                                      nonzeros=6850107,

                                                      allocated

                                                      nonzeros=6850107<br>

                                                      total number of

                                                      mallocs used

                                                      during

                                                      MatSetValues calls

                                                      =0<br>

                                                      not using I-node

                                                      routines<br>

                                                                  KSP of

                                                      A00<br>

                                                                    KSP

                                                      Object:           

                                                        (fieldsplit_0_) 

                                                                   1<br>

                                                      MPI processes<br>

                                                      type: preonly<br>

                                                      maximum

                                                      iterations=10000,

                                                      initial guess is

                                                      zero<br>

                                                      tolerances: 

                                                      relative=1e-05,

                                                      absolute=1e-50,<br>

                                                      divergence=10000.<br>

                                                      left

                                                      preconditioning<br>

                                                      using NONE norm

                                                      type for

                                                      convergence test<br>

                                                                    PC

                                                      Object:           

                                                        (fieldsplit_0_) 

                                                                   1<br>

                                                      MPI processes<br>

                                                      type: ml<br>

                                                      MG: type is

                                                      MULTIPLICATIVE,

                                                      levels=5 cycles=v<br>

                                                        Cycles per

                                                      PCApply=1<br>

                                                        Using Galerkin

                                                      computed coarse

                                                      grid matrices<br>

                                                      Coarse grid solver

                                                      -- level

                                                      -------------------------------<br>

                                                      KSP Object:<br>

(fieldsplit_0_mg_coarse_)                   1 MPI processes<br>

                                                        type: preonly<br>

                                                        maximum

                                                      iterations=10000,

                                                      initial guess is

                                                      zero<br>

                                                        tolerances: 

                                                      relative=1e-05,

                                                      absolute=1e-50,<br>

                                                      divergence=10000.<br>

                                                        left

                                                      preconditioning<br>

                                                        using NONE norm

                                                      type for

                                                      convergence test<br>

                                                      PC Object:<br>

(fieldsplit_0_mg_coarse_)                   1 MPI processes<br>

                                                        type: lu<br>

                                                          LU:

                                                      out-of-place

                                                      factorization<br>

                                                          tolerance for

                                                      zero pivot

                                                      2.22045e-14<br>

                                                          using diagonal

                                                      shift on blocks to

                                                      prevent zero<br>

                                                      pivot [INBLOCKS]<br>

                                                          matrix

                                                      ordering: nd<br>

                                                          factor fill

                                                      ratio given 5.,

                                                      needed 1.<br>

                                                            Factored

                                                      matrix follows:<br>

                                                              Mat

                                                      Object:           

                                                                     1

                                                      MPI<br>

                                                      processes<br>

                                                                type:

                                                      seqaij<br>

                                                                rows=3,

                                                      cols=3<br>

                                                                package

                                                      used to perform

                                                      factorization:

                                                      petsc<br>

                                                                total:

                                                      nonzeros=3,

                                                      allocated

                                                      nonzeros=3<br>

                                                                total

                                                      number of mallocs

                                                      used during<br>

                                                      MatSetValues calls

                                                      =0<br>

                                                                  not

                                                      using I-node

                                                      routines<br>

                                                               </blockquote></div></div></div></blockquote></div></blockquote></div></div></div></blockquote></div></blockquote></div></div></div></blockquote></div></div></div></blockquote></div></div></div></blockquote></div></blockquote></div></div>