<div><br><div class="gmail_quote"><div>On Wed, 14 Jun 2017 at 19:42, David Nolte <<a href="mailto:dnolte@dim.uchile.cl">dnolte@dim.uchile.cl</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    Dave, thanks a lot for your great answer and for sharing your
    experience. I have a much clearer picture now. :)<br>
    <br>
    The experiments 3/ give the desired results for examples of cavity
    flow. The (1/mu scaled) mass matrix seems OK.<br>
    <br>
    I followed your and Matt's recommendations, used a FULL Schur
    factorization, LU in the 0th split, and gradually relaxed the
    tolerance of GMRES/Jacobi in split 1 (observed the gradual increase
    in outer iterations). Then I replaced the split_0 LU with AMG
    (further increase of outer iterations and iterations on the Schur
    complement). <br>
    Doing so I converged to using hypre boomeramg (smooth_type Euclid,
    strong_threshold 0.75) and 3 iterations of GMRES/Jacobi on the Schur
    block, which gave the best time-to-solution in my particular setup
    and convergence to rtol=1e-8 within 60 outer iterations.<br>
    In my cases, using GMRES in the 0th split (with rtol 1e-1 or 1e-2)
    instead of "preonly" did not help convergence (on the contrary).<br>
    <br>
    I also repeated the experiments with
    "-pc_fieldsplit_schur_precondition selfp", with hypre(ilu) in split
    0 and hypre in split 1, just to check, and somewhat disappointingly
    ( ;-) ) the wall time is less than half than when using gmres/Jac
    and Sp = mass matrix.<br>
    I am aware that this says nothing about scaling and robustness with
    respect to h-refinement...</div></blockquote><div><br></div><div>- selfp defines the schur pc as A10 inv(diag(A00)) A01. This operator is not spectrally equivalent to S</div><div><br></div><div>- For split 0 did you use preonly-hypre(ilu)?</div><div><br></div><div>- For split 1 did you also use hypre(ilu) (you just wrote hypre)?</div><div><br></div><div>- What was the iteration count for the saddle point problem with hypre and selfp? Iterates will increase if you refine the mesh and a cross over will occur at some (unknown) resolution and the mass matrix variant will be faster.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>
    <br>
    Would you agree that these configurations "make sense"?</div></blockquote><div><br></div><div>If you want to weak scale, the configuration with the mass matrix makes the most sense.</div><div><br></div><div>If you are only interested in solving many problems on one mesh, then do what ever you can to make the solve time as fast as possible - including using preconditioners defined with non-spectrally equivalent operators :D</div><div><br></div><div>Thanks,</div><div>  Dave</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>
    Furthermore, maybe anyone has a hint where to start tuning
    multigrid? So far hypre worked better than ML, but I have not
    experimented much with the parameters.</div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>
    <br>
    <br>
    Thanks again for your help!<br>
    <br>
    Best wishes,<br>
    David</div><div bgcolor="#FFFFFF" text="#000000"><br>
    <br>
    <br>
    <br>
    <div class="m_-7814857284138451345moz-cite-prefix">On 06/12/2017 04:52 PM, Dave May wrote:<br>
    </div>
    <blockquote type="cite">
      <div>I've been following the discussion and have a
        couple of comments:
        <div><br>
        </div>
        <div>1/ For the preconditioners that you are using (Schur
          factorisation LDU, or upper block triangular DU), the
          convergence properties (e.g. 1 iterate for LDU and 2 iterates
          for DU) come from analysis involving exact inverses of A_00
          and S</div>
        <div><br>
        </div>
        <div>Once you switch from using exact inverses of A_00 and S,
          you have to rely on spectral equivalence of operators. That is
          fine, but the spectral equivalence does not tell you how many
          iterates LDU or DU will require to converge. What it does
          inform you about is that if you have a spectrally equivalent
          operator for A_00 and S (Schur complement), then under mesh
          refinement, your iteration count (whatever it was prior to
          refinement) will not increase.<br>
        </div>
        <div><br>
        </div>
        <div>2/ Looking at your first set of options, I see you have
          opted to use -fieldsplit_ksp_type preonly (for both split 0
          and 1). That is nice as it creates a linear operator thus you
          don't need something like FGMRES or GCR applied to the saddle
          point problem. </div>
        <div><br>
        </div>
        <div>Your choice for Schur is fine in the sense that the
          diagonal of M is spectrally equivalent to M, and M is
          spectrally equivalent to S. Whether it is "fine" in terms of
          the iteration count for Schur systems, we cannot say apriori
          (since the spectral equivalence doesn't give us direct info
          about the iterations we should expect). </div>
        <div><br>
        </div>
        <div>Your preconditioner for A_00 relies on AMG producing a
          spectrally equivalent operator with bounds which are tight
          enough to ensure convergence of the saddle point problem. I'll
          try explain this.</div>
        <div><br>
        </div>
        <div>In my experience, for many problems (unstructured FE with
          variable coefficients, structured FE meshes with variable
          coefficients) AMG and preonly is not a robust choice. To
          control the approximation (the spectral equiv bounds), I
          typically run a stationary or Krylov method on split 0 (e.g.
          -fieldsplit_0_ksp_type xxx -fieldsplit_0_kps_rtol yyy). Since
          the AMG preconditioner generated is spectrally equivalent
          (usually!), these solves will converge to a chosen rtol in a
          constant number of iterates under h-refinement. In practice,
          if I don't enforce that I hit something like rtol=1.0e-1 (or
          1.0e-2) on the 0th split, saddle point iterates will typically
          increase for "hard" problems under mesh refinement (1e4-1e7
          coefficient variation), and may not even converge at all when
          just using -fieldsplit_0_ksp_type preonly. Failure ultimately
          depends on how "strong" the preconditioner for A_00 block is
          (consider re-discretized geometric multigrid versus AMG).
          Running an iterative solve on the 0th split lets you control
          and recover from weak/poor, but spectrally equivalent
          preconditioners for A_00. Note that people hate this approach
          as it invariably nests Krylov methods, and subsequently adds
          more global reductions. However, it is scalable, optimal,
          tuneable and converges faster than the case which didn't
          converge at all :D</div>
        <div><br>
        </div>
        <div>3/ I agree with Matt's comments, but I'd do a couple of
          other things first.</div>
        <div><br>
        </div>
        <div>* I'd first check the discretization is implemented
          correctly. Your P2/P1 element is inf-sup stable - thus the
          condition number of S (unpreconditioned) should be independent
          of the mesh resolution (h). An easy way to verify this is to
          run either LDU (schur_fact_type full) or DU (schur_fact_type
          upper) and monitor the iterations required for those S solves.
          Use -fieldsplit_1_pc_type none -fieldsplit_1_ksp_rtol 1.0e-8
          -fieldsplit_1_ksp_monitor_true_residual
          -fieldsplit_1_ksp_pc_right -fieldsplit_1_ksp_type gmres
          -fieldsplit_0_pc_type lu</div>
        <div><br>
        </div>
        <div>Then refine the mesh (ideally via sub-division) and repeat
          the experiment.</div>
        <div>If the S iterates don't asymptote, but instead grow with
          each refinement - you likely have a problem with the
          discretisation.</div>
        <div><br>
        </div>
        <div>* Do the same experiment, but this time use your mass
          matrix as the preconditioner for S and use
          -fieldsplit_1_pc_type lu. If the iterates, compared with the
          previous experiments (without a Schur PC) have gone up your
          mass matrix is not defined correctly. If in the previous
          experiment (without a Schur PC) iterates on the S solves were
          bounded, but now when preconditioned with the mass matrix the
          iterates go up, then your mass matrix is definitely not
          correct.</div>
        <div><br>
        </div>
        <div>4/ Lastly, to finally get to your question regarding does
           +400 iterates for the solving the Schur seem "reasonable" and
          what is "normal behaviour"? </div>
        <div><br>
        </div>
        <div>It seems "high" to me. However the specifics of your
          discretisation, mesh topology, element quality, boundary
          conditions render it almost impossible to say what should be
          expected. When I use a Q2-P2* discretisation on a structured
          mesh with a non-constant viscosity I'd expect something like
          50-60 for 1.0e-10 with a mass matrix scaled by the inverse
          (local) viscosity. For constant viscosity maybe 30 iterates. I
          think this kind of statement is not particularly useful or
          helpful though.</div>
        <div><br>
        </div>
        <div>
          <div>Given you use an unstructured tet mesh, it is possible
            that some elements have very bad quality (high aspect ratio
            (AR), highly skewed). I am certain that P2/P1 has an inf-sup
            constant which is sensitive to the element aspect ratio (I
            don't recall the exact scaling wrt AR). From experience I
            know that using the mass matrix as a preconditioner for
            Schur is not robust as AR increases (e.g. iterations for the
            S solve grow). Hence, with a couple of "bad" element in your
            mesh, I could imagine that you could end up having to
            perform +400 iterations </div>
        </div>
        <div><br>
        </div>
        <div>5/ Lastly, definitely don't impose one Dirichlet BC on
          pressure to make the pressure unique. This really screws up
          all the nice properties of your matrices. Just enforce the
          constant null space for p. And as you noticed, GMRES magically
          just does it automatically if the RHS of your original system
          was consistent.</div>
        <div> </div>
        <div>Thanks,</div>
        <div>  Dave</div>
        <div><br>
        </div>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On 12 June 2017 at 20:20, David Nolte
            <span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
              <div bgcolor="#FFFFFF"> Ok. With <tt>"-pc_fieldsplit_schur_fact_type
                  full" </tt>the outer iteration converges in 1 step.
                The problem remain the Schur iterations.<br>
                <br>
                I was not sure if the problem was maybe the singular
                pressure or the pressure Dirichlet BC. I tested the
                solver with a standard Stokes flow in a pipe with a
                constriction (zero Neumann BC for the pressure at the
                outlet) and in a 3D cavity (enclosed flow, no pressure
                BC or fixed at one point). I am not sure if I need to
                attach the constant pressure nullspace to the matrix for
                GMRES. Not doing so does not alter the convergence of
                GMRES in the Schur solver (nor the pressure solution),
                using a pressure Dirichlet BC however slows down
                convergence (I suppose because of the scaling of the
                matrix).<br>
                <br>
                I also checked the pressure mass matrix that I give
                PETSc, it looks correct.<br>
                <br>
                In all these cases, the solver behaves just as before.
                With LU in fieldsplit_0 and GMRES/LU with rtol 1e-10 in
                fieldsplit_1, it converges after 1 outer iteration, but
                the inner Schur solver converges slowly. <br>
                <br>
                How should the convergence of GMRES/LU of the Schur
                complement *normally* behave?<br>
                <br>
                Thanks again!<span class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114HOEnZb"><font color="#888888"><br>
                    David</font></span>
                <div>
                  <div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114h5"><br>
                    <br>
                    <br>
                    <br>
                    <div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755moz-cite-prefix">On
                      06/12/2017 12:41 PM, Matthew Knepley wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div>
                        <div class="gmail_extra">
                          <div class="gmail_quote">On Mon, Jun 12, 2017
                            at 10:36 AM, David Nolte <span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>
                            wrote:<br>
                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                              <div bgcolor="#FFFFFF"> <br>
                                <div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171moz-cite-prefix">On
                                  06/12/2017 07:50 AM, Matthew Knepley
                                  wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div>
                                    <div class="gmail_extra">
                                      <div class="gmail_quote">On Sun,
                                        Jun 11, 2017 at 11:06 PM, David
                                        Nolte <span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>
                                        wrote:<br>
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                          <div bgcolor="#FFFFFF"> Thanks
                                            Matt, makes sense to me!<br>
                                            <br>
                                            I skipped direct solvers at
                                            first because for these
                                            'real' configurations LU
                                            (mumps/superlu_dist) usally
                                            goes out of memory (got 32GB
                                            RAM). It would be reasonable
                                            to take one more step back
                                            and play with synthetic
                                            examples.<br>
                                            I managed to run one case
                                            though with 936k dofs using:
                                            ("user" =pressure mass
                                            matrix)<br>
                                            <br>
                                            <tt><...><br>
-pc_fieldsplit_schur_fact_type upper</tt><tt><br>
                                            </tt><tt>-pc_fieldsplit_schur_precondition
                                              user</tt><tt><br>
                                            </tt><tt>-fieldsplit_0_ksp_type
                                              preonly   </tt><tt><br>
                                            </tt><tt>-fieldsplit_0_pc_type
                                              lu</tt><tt><br>
                                            </tt><tt>-fieldsplit_0_pc_factor_mat_solver_package
                                              mumps</tt><tt><br>
                                            </tt><tt><br>
                                            </tt><tt>
                                              -fieldsplit_1_ksp_type
                                              gmres<br>
-fieldsplit_1_ksp_monitor_true_residuals<br>
                                              -fieldsplit_1_ksp_rtol
                                              1e-10<br>
                                            </tt><tt>-fieldsplit_1_pc_type
                                              lu</tt><tt><br>
                                            </tt><tt>
                                              -fieldsplit_1_pc_factor_mat_solver_package
                                              mumps</tt><tt><br>
                                            </tt><br>
                                            It takes 2 outer iterations,
                                            as expected. However the
                                            fieldsplit_1 solve takes
                                            very long.<br>
                                          </div>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>1) It should take 1 outer
                                          iterate, not two. The problem
                                          is that your Schur tolerance
                                          is way too high. Use</div>
                                        <div><br>
                                        </div>
                                        <div>  -fieldsplit_1_ksp_rtol
                                          1e-10</div>
                                        <div><br>
                                        </div>
                                        <div>or something like that.
                                          Then it will take 1 iterate.</div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                                <br>
                                Shouldn't it take 2 with a triangular
                                Schur factorization and exact
                                preconditioners, and 1 with a full
                                factorization? (cf. Benzi et al 2005,
                                p.66, <a class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171moz-txt-link-freetext" href="http://www.mathcs.emory.edu/%7Ebenzi/Web_papers/bgl05.pdf" target="_blank">http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf</a>)<br>
                                <br>
                                That's exactly what I set: <tt>
                                  -fieldsplit_1_ksp_rtol 1e-10 </tt>and
                                the Schur solver does drop below "rtol
                                < 1e-10"<br>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Oh, yes. Take away the upper until
                              things are worked out.</div>
                            <div><br>
                            </div>
                            <div>  Thanks,</div>
                            <div><br>
                            </div>
                            <div>    Matt</div>
                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                              <div bgcolor="#FFFFFF">
                                <blockquote type="cite">
                                  <div>
                                    <div class="gmail_extra">
                                      <div class="gmail_quote">
                                        <div><br>
                                        </div>
                                        <div>2) There is a problem with
                                          the Schur solve. Now from the
                                          iterates</div>
                                        <div><br>
                                        </div>
                                        <div><span style="font-family:monospace">423
                                            KSP preconditioned resid
                                            norm 2.638419658982e-02 true
                                            resid norm
                                            7.229653211635e-11
                                            ||r(i)||/||b||
                                            7.229653211635e-11</span><br>
                                        </div>
                                        <div><br>
                                        </div>
                                        <div>it is clear that the
                                          preconditioner is really
                                          screwing stuff up. For
                                          testing, you can use</div>
                                        <div><br>
                                        </div>
                                        <div> 
                                          -pc_fieldsplit_schur_precondition
                                          full</div>
                                        <div><br>
                                        </div>
                                        <div>and your same setup here.
                                          It should take one iterate. I
                                          think there is something wrong
                                          with your</div>
                                        <div>mass matrix.</div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                                <br>
                                I agree. I forgot to mention that I am
                                considering an "enclosed flow" problem,
                                with u=0 on all the boundary and a
                                Dirichlet condition for the pressure in
                                one point for fixing the constant
                                pressure. Maybe the preconditioner is
                                not consistent with this setup, need to
                                check this..<br>
                                <br>
                                Thanks a lot<br>
                                <br>
                                <br>
                                <blockquote type="cite">
                                  <div>
                                    <div class="gmail_extra">
                                      <div class="gmail_quote">
                                        <div><br>
                                        </div>
                                        <div>  Thanks,</div>
                                        <div><br>
                                        </div>
                                        <div>    Matt</div>
                                        <div><br>
                                        </div>
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                          <div bgcolor="#FFFFFF"> <br>
                                            <tt>  0 KSP unpreconditioned
                                              resid norm
                                              4.038466809302e-03 true
                                              resid norm
                                              4.038466809302e-03
                                              ||r(i)||/||b||
                                              1.000000000000e+00</tt><tt><br>
                                            </tt><tt>    Residual norms
                                              for fieldsplit_1_ solve.</tt><tt><br>
                                            </tt><tt>    0 KSP
                                              preconditioned resid norm
                                              0.000000000000e+00 true
                                              resid norm
                                              0.000000000000e+00
                                              ||r(i)||/||b||          
                                              -nan</tt><tt><br>
                                            </tt><tt>  Linear
                                              fieldsplit_1_ solve
                                              converged due to
                                              CONVERGED_ATOL iterations
                                              0</tt><tt><br>
                                            </tt><tt>  1 KSP
                                              unpreconditioned resid
                                              norm 4.860095964831e-06
                                              true resid norm
                                              4.860095964831e-06
                                              ||r(i)||/||b||
                                              1.203450763452e-03</tt><tt><br>
                                            </tt><tt>    Residual norms
                                              for fieldsplit_1_ solve.</tt><tt><br>
                                            </tt><tt>    0 KSP
                                              preconditioned resid norm
                                              2.965546249872e+08 true
                                              resid norm
                                              1.000000000000e+00
                                              ||r(i)||/||b||
                                              1.000000000000e+00</tt><tt><br>
                                            </tt><tt>    1 KSP
                                              preconditioned resid norm
                                              1.347596594634e+08 true
                                              resid norm
                                              3.599678801575e-01
                                              ||r(i)||/||b||
                                              3.599678801575e-01</tt><tt><br>
                                            </tt><tt>    2 KSP
                                              preconditioned resid norm
                                              5.913230136403e+07 true
                                              resid norm
                                              2.364916760834e-01
                                              ||r(i)||/||b||
                                              2.364916760834e-01</tt><tt><br>
                                            </tt><tt>    3 KSP
                                              preconditioned resid norm
                                              4.629700028930e+07 true
                                              resid norm
                                              1.984444715595e-01
                                              ||r(i)||/||b||
                                              1.984444715595e-01</tt><tt><br>
                                            </tt><tt>    4 KSP
                                              preconditioned resid norm
                                              3.804431276819e+07 true
                                              resid norm
                                              1.747224559120e-01
                                              ||r(i)||/||b||
                                              1.747224559120e-01</tt><tt><br>
                                            </tt><tt>    5 KSP
                                              preconditioned resid norm
                                              3.178769422140e+07 true
                                              resid norm
                                              1.402254864444e-01
                                              ||r(i)||/||b||
                                              1.402254864444e-01</tt><tt><br>
                                            </tt><tt>    6 KSP
                                              preconditioned resid norm
                                              2.648669043919e+07 true
                                              resid norm
                                              1.191164310866e-01
                                              ||r(i)||/||b||
                                              1.191164310866e-01</tt><tt><br>
                                            </tt><tt>    7 KSP
                                              preconditioned resid norm
                                              2.203522108614e+07 true
                                              resid norm
                                              9.690500018007e-02
                                              ||r(i)||/||b||
                                              9.690500018007e-02</tt><tt><br>
                                                  <...><br>
                                                  422 KSP preconditioned
                                              resid norm
                                              2.984888715147e-02 true
                                              resid norm
                                              8.598401046494e-11
                                              ||r(i)||/||b||
                                              8.598401046494e-11<br>
                                                  423 KSP preconditioned
                                              resid norm
                                              2.638419658982e-02 true
                                              resid norm
                                              7.229653211635e-11
                                              ||r(i)||/||b||
                                              7.229653211635e-11<br>
                                                Linear fieldsplit_1_
                                              solve converged due to
                                              CONVERGED_RTOL iterations
                                              423<br>
                                                2 KSP unpreconditioned
                                              resid norm
                                              3.539889585599e-16 true
                                              resid norm
                                              3.542279617063e-16
                                              ||r(i)||/||b||
                                              8.771347603759e-14<br>
                                              Linear solve converged due
                                              to CONVERGED_RTOL
                                              iterations 2<br>
                                            </tt><tt><br>
                                                              </tt><br>
                                            Does the slow convergence of
                                            the Schur block mean that my
                                            preconditioning matrix Sp is
                                            a poor choice?<br>
                                            <br>
                                            Thanks,<br>
                                            David<br>
                                            <br>
                                            <br>
                                            <div class="m_-7814857284138451345gmail-m_2691972541491180255gmail-m_1522616294910952114m_-1125133874872333755m_4366232618162032171gmail-m_5328507656823621836moz-cite-prefix">On
                                              06/11/2017 08:53 AM,
                                              Matthew Knepley wrote:<br>
                                            </div>
                                            <blockquote type="cite">
                                              <div>
                                                <div class="gmail_extra">
                                                  <div class="gmail_quote">On
                                                    Sat, Jun 10, 2017 at
                                                    8:25 PM, David Nolte
                                                    <span><<a href="mailto:dnolte@dim.uchile.cl" target="_blank">dnolte@dim.uchile.cl</a>></span>
                                                    wrote:<br>
                                                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Dear
                                                      all,<br>
                                                      <br>
                                                      I am solving a
                                                      Stokes problem in
                                                      3D aorta
                                                      geometries, using
                                                      a P2/P1<br>
                                                      finite elements
                                                      discretization on
                                                      tetrahedral meshes
                                                      resulting in<br>
                                                      ~1-1.5M DOFs.
                                                      Viscosity is
                                                      uniform (can be
                                                      adjusted
                                                      arbitrarily), and<br>
                                                      the right hand
                                                      side is a function
                                                      of noisy
                                                      measurement data.<br>
                                                      <br>
                                                      In other settings
                                                      of "standard"
                                                      Stokes flow
                                                      problems I have
                                                      obtained<br>
                                                      good convergence
                                                      with an "upper"
                                                      Schur complement
                                                      preconditioner,
                                                      using<br>
                                                      AMG (ML or Hypre)
                                                      on the velocity
                                                      block and
                                                      approximating the
                                                      Schur<br>
                                                      complement matrix
                                                      by the diagonal of
                                                      the pressure mass
                                                      matrix:<br>
                                                      <br>
                                                         
                                                      -ksp_converged_reason<br>
                                                         
                                                      -ksp_monitor_true_residual<br>
                                                         
                                                      -ksp_initial_guess_nonzero<br>
                                                         
                                                      -ksp_diagonal_scale<br>
                                                         
                                                      -ksp_diagonal_scale_fix<br>
                                                          -ksp_type
                                                      fgmres<br>
                                                          -ksp_rtol
                                                      1.0e-8<br>
                                                      <br>
                                                          -pc_type
                                                      fieldsplit<br>
                                                         
                                                      -pc_fieldsplit_type
                                                      schur<br>
                                                         
                                                      -pc_fieldsplit_detect_saddle_point<br>
                                                         
                                                      -pc_fieldsplit_schur_fact_type
                                                      upper<br>
                                                         
                                                      -pc_fieldsplit_schur_precondition
                                                      user    # <--
                                                      pressure mass
                                                      matrix<br>
                                                      <br>
                                                         
                                                      -fieldsplit_0_ksp_type
                                                      preonly<br>
                                                         
                                                      -fieldsplit_0_pc_type
                                                      ml<br>
                                                      <br>
                                                         
                                                      -fieldsplit_1_ksp_type
                                                      preonly<br>
                                                         
                                                      -fieldsplit_1_pc_type
                                                      jacobi<br>
                                                    </blockquote>
                                                    <div><br>
                                                    </div>
                                                    <div>1) I always
                                                      recommend starting
                                                      from an exact
                                                      solver and backing
                                                      off in small steps
                                                      for optimization.
                                                      Thus</div>
                                                    <div>    I would
                                                      start with LU on
                                                      the upper block
                                                      and GMRES/LU with
                                                      toelrance 1e-10 on
                                                      the Schur block.</div>
                                                    <div>    This should
                                                      converge in 1
                                                      iterate.</div>
                                                    <div><br>
                                                    </div>
                                                    <div>2) I don't
                                                      think you want
                                                      preonly on the
                                                      Schur system. You
                                                      might want
                                                      GMRES/Jacobi to
                                                      invert the mass
                                                      matrix.</div>
                                                    <div><br>
                                                    </div>
                                                    <div>3) You probably
                                                      want to tighten
                                                      the tolerance on
                                                      the Schur solve,
                                                      at least to start,
                                                      and then slowly
                                                      let it out. The</div>
                                                    <div>    tight
                                                      tolerance will
                                                      show you how
                                                      effective the
                                                      preconditioner is
                                                      using that Schur
                                                      operator. Then you
                                                      can start</div>
                                                    <div>    to evaluate
                                                      how effective the
                                                      Schur linear
                                                      sovler is.</div>
                                                    <div><br>
                                                    </div>
                                                    <div>Does this make
                                                      sense?</div>
                                                    <div><br>
                                                    </div>
                                                    <div>  Thanks,</div>
                                                    <div><br>
                                                    </div>
                                                    <div>     Matt</div>
                                                    <div> </div>
                                                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                                      In my present case
                                                      this setup gives
                                                      rather slow
                                                      convergence
                                                      (varies for<br>
                                                      different
                                                      geometries between
                                                      200-500 or several
                                                      thousands!). I
                                                      obtain<br>
                                                      better convergence
                                                      with
                                                      "-pc_fieldsplit_schur_precondition
                                                      selfp"and<br>
                                                      using multigrid on
                                                      S, with
                                                      "-fieldsplit_1_pc_type
                                                      ml" (I don't think<br>
                                                      this is optimal,
                                                      though).<br>
                                                      <br>
                                                      I don't understand
                                                      why the pressure
                                                      mass matrix
                                                      approach performs
                                                      so<br>
                                                      poorly and wonder
                                                      what I could try
                                                      to improve the
                                                      convergence. Until
                                                      now<br>
                                                      I have been using
                                                      ML and Hypre
                                                      BoomerAMG mostly
                                                      with default
                                                      parameters.<br>
                                                      Surely they can be
                                                      improved by tuning
                                                      some parameters.
                                                      Which could be a<br>
                                                      good starting
                                                      point? Are there
                                                      other options I
                                                      should consider?<br>
                                                      <br>
                                                      With the above
                                                      setup (jacobi) for
                                                      a case that works
                                                      better than
                                                      others,<br>
                                                      the KSP terminates
                                                      with<br>
                                                      467 KSP
                                                      unpreconditioned
                                                      resid norm
                                                      2.072014323515e-09
                                                      true resid norm<br>
                                                      2.072014322600e-09
                                                      ||r(i)||/||b||
                                                      9.939098100674e-09<br>
                                                      <br>
                                                      You can find the
                                                      output of
                                                      -ksp_view below.
                                                      Let me know if you
                                                      need more<br>
                                                      details.<br>
                                                      <br>
                                                      Thanks in advance
                                                      for your advice!<br>
                                                      Best wishes<br>
                                                      David<br>
                                                      <br>
                                                      <br>
                                                      KSP Object: 1 MPI
                                                      processes<br>
                                                        type: fgmres<br>
                                                          GMRES:
                                                      restart=30, using
                                                      Classical
                                                      (unmodified)
                                                      Gram-Schmidt<br>
                                                      Orthogonalization
                                                      with no iterative
                                                      refinement<br>
                                                          GMRES: happy
                                                      breakdown
                                                      tolerance 1e-30<br>
                                                        maximum
                                                      iterations=10000<br>
                                                        tolerances: 
                                                      relative=1e-08,
                                                      absolute=1e-50,
                                                      divergence=10000.<br>
                                                        right
                                                      preconditioning<br>
                                                        diagonally
                                                      scaled system<br>
                                                        using nonzero
                                                      initial guess<br>
                                                        using
                                                      UNPRECONDITIONED
                                                      norm type for
                                                      convergence test<br>
                                                      PC Object: 1 MPI
                                                      processes<br>
                                                        type: fieldsplit<br>
                                                          FieldSplit
                                                      with Schur
                                                      preconditioner,
                                                      factorization
                                                      UPPER<br>
                                                          Preconditioner
                                                      for the Schur
                                                      complement formed
                                                      from user provided
                                                      matrix<br>
                                                          Split info:<br>
                                                          Split number 0
                                                      Defined by IS<br>
                                                          Split number 1
                                                      Defined by IS<br>
                                                          KSP solver for
                                                      A00 block<br>
                                                            KSP Object: 
                                                         
                                                      (fieldsplit_0_)   
                                                         1 MPI processes<br>
                                                              type:
                                                      preonly<br>
                                                              maximum
                                                      iterations=10000,
                                                      initial guess is
                                                      zero<br>
                                                             
                                                      tolerances: 
                                                      relative=1e-05,
                                                      absolute=1e-50,
                                                      divergence=10000.<br>
                                                              left
                                                      preconditioning<br>
                                                              using NONE
                                                      norm type for
                                                      convergence test<br>
                                                            PC Object: 
                                                         
                                                      (fieldsplit_0_)   
                                                         1 MPI processes<br>
                                                              type: ml<br>
                                                                MG: type
                                                      is MULTIPLICATIVE,
                                                      levels=5 cycles=v<br>
                                                                  Cycles
                                                      per PCApply=1<br>
                                                                  Using
                                                      Galerkin computed
                                                      coarse grid
                                                      matrices<br>
                                                              Coarse
                                                      grid solver --
                                                      level
                                                      -------------------------------<br>
                                                                KSP
                                                      Object:         
                                                      (fieldsplit_0_mg_coarse_) 
                                                               1 MPI<br>
                                                      processes<br>
                                                                  type:
                                                      preonly<br>
                                                                 
                                                      maximum
                                                      iterations=10000,
                                                      initial guess is
                                                      zero<br>
                                                                 
                                                      tolerances: 
                                                      relative=1e-05,
                                                      absolute=1e-50,
                                                      divergence=10000.<br>
                                                                  left
                                                      preconditioning<br>
                                                                  using
                                                      NONE norm type for
                                                      convergence test<br>
                                                                PC
                                                      Object:         
                                                      (fieldsplit_0_mg_coarse_) 
                                                               1 MPI<br>
                                                      processes<br>
                                                                  type:
                                                      lu<br>
                                                                    LU:
                                                      out-of-place
                                                      factorization<br>
                                                                   
                                                      tolerance for zero
                                                      pivot 2.22045e-14<br>
                                                                   
                                                      using diagonal
                                                      shift on blocks to
                                                      prevent zero pivot<br>
                                                      [INBLOCKS]<br>
                                                                   
                                                      matrix ordering:
                                                      nd<br>
                                                                   
                                                      factor fill ratio
                                                      given 5., needed
                                                      1.<br>
                                                                     
                                                      Factored matrix
                                                      follows:<br>
                                                                       
                                                      Mat Object:       
                                                                 1 MPI
                                                      processes<br>
                                                                       
                                                        type: seqaij<br>
                                                                       
                                                        rows=3, cols=3<br>
                                                                       
                                                        package used to
                                                      perform
                                                      factorization:
                                                      petsc<br>
                                                                       
                                                        total:
                                                      nonzeros=3,
                                                      allocated
                                                      nonzeros=3<br>
                                                                       
                                                        total number of
                                                      mallocs used
                                                      during
                                                      MatSetValues<br>
                                                      calls =0<br>
                                                                       
                                                          not using
                                                      I-node routines<br>
                                                                  linear
                                                      system matrix =
                                                      precond matrix:<br>
                                                                  Mat
                                                      Object:           
                                                       1 MPI processes<br>
                                                                   
                                                      type: seqaij<br>
                                                                   
                                                      rows=3, cols=3<br>
                                                                   
                                                      total: nonzeros=3,
                                                      allocated
                                                      nonzeros=3<br>
                                                                   
                                                      total number of
                                                      mallocs used
                                                      during
                                                      MatSetValues calls
                                                      =0<br>
                                                                     
                                                      not using I-node
                                                      routines<br>
                                                              Down
                                                      solver
                                                      (pre-smoother) on
                                                      level 1<br>
-------------------------------<br>
                                                                KSP
                                                      Object:         
                                                      (fieldsplit_0_mg_levels_1_) 
                                                               1<br>
                                                      MPI processes<br>
                                                                  type:
                                                      richardson<br>
                                                                   
                                                      Richardson:
                                                      damping factor=1.<br>
                                                                 
                                                      maximum
                                                      iterations=2<br>
                                                                 
                                                      tolerances: 
                                                      relative=1e-05,
                                                      absolute=1e-50,
                                                      divergence=10000.<br>
                                                                  left
                                                      preconditioning<br>
                                                                  using
                                                      nonzero initial
                                                      guess<br>
                                                                  using
                                                      NONE norm type for
                                                      convergence test<br>
                                                                PC
                                                      Object:         
                                                      (fieldsplit_0_mg_levels_1_) 
                                                               1<br>
                                                      MPI processes<br>
                                                                  type:
                                                      sor<br>
                                                                    SOR:
                                                      type =
                                                      local_symmetric,
                                                      iterations = 1,
                                                      local<br>
                                                      iterations = 1,
                                                      omega = 1.<br>
                                                                  linear
                                                      system matrix =
                                                      precond matrix:<br>
                                                                  Mat
                                                      Object:           
                                                       1 MPI processes<br>
                                                                   
                                                      type: seqaij<br>
                                                                   
                                                      rows=15, cols=15<br>
                                                                   
                                                      total:
                                                      nonzeros=69,
                                                      allocated
                                                      nonzeros=69<br>
                                                                   
                                                      total number of
                                                      mallocs used
                                                      during
                                                      MatSetValues calls
                                                      =0<br>
                                                                     
                                                      not using I-node
                                                      routines<br>
                                                              Up solver
                                                      (post-smoother)
                                                      same as down
                                                      solver
                                                      (pre-smoother)<br>
                                                              Down
                                                      solver
                                                      (pre-smoother) on
                                                      level 2<br>
-------------------------------<br>
                                                                KSP
                                                      Object:         
                                                      (fieldsplit_0_mg_levels_2_) 
                                                               1<br>
                                                      MPI processes<br>
                                                                  type:
                                                      richardson<br>
                                                                   
                                                      Richardson:
                                                      damping factor=1.<br>
                                                                 
                                                      maximum
                                                      iterations=2<br>
                                                                 
                                                      tolerances: 
                                                      relative=1e-05,
                                                      absolute=1e-50,
                                                      divergence=10000.<br>
                                                                  left
                                                      preconditioning<br>
                                                                  using
                                                      nonzero initial
                                                      guess<br>
                                                                  using
                                                      NONE norm type for
                                                      convergence test<br>
                                                                PC
                                                      Object:         
                                                      (fieldsplit_0_mg_levels_2_) 
                                                               1<br>
                                                      MPI processes<br>
                                                                  type:
                                                      sor<br>
                                                                    SOR:
                                                      type =
                                                      local_symmetric,
                                                      iterations = 1,
                                                      local<br>
                                                      iterations = 1,
                                                      omega = 1.<br>
                                                                  linear
                                                      system matrix =
                                                      precond matrix:<br>
                                                                  Mat
                                                      Object:           
                                                       1 MPI processes<br>
                                                                   
                                                      type: seqaij<br>
                                                                   
                                                      rows=304, cols=304<br>
                                                                   
                                                      total:
                                                      nonzeros=7354,
                                                      allocated
                                                      nonzeros=7354<br>
                                                                   
                                                      total number of
                                                      mallocs used
                                                      during
                                                      MatSetValues calls
                                                      =0<br>
                                                                     
                                                      not using I-node
                                                      routines<br>
                                                              Up solver
                                                      (post-smoother)
                                                      same as down
                                                      solver
                                                      (pre-smoother)<br>
                                                              Down
                                                      solver
                                                      (pre-smoother) on
                                                      level 3<br>
-------------------------------<br>
                                                                KSP
                                                      Object:         
                                                      (fieldsplit_0_mg_levels_3_) 
                                                               1<br>
                                                      MPI processes<br>
                                                                  type:
                                                      richardson<br>
                                                                   
                                                      Richardson:
                                                      damping factor=1.<br>
                                                                 
                                                      maximum
                                                      iterations=2<br>
                                                                 
                                                      tolerances: 
                                                      relative=1e-05,
                                                      absolute=1e-50,
                                                      divergence=10000.<br>
                                                                  left
                                                      preconditioning<br>
                                                                  using
                                                      nonzero initial
                                                      guess<br>
                                                                  using
                                                      NONE norm type for
                                                      convergence test<br>
                                                                PC
                                                      Object:         
                                                      (fieldsplit_0_mg_levels_3_) 
                                                               1<br>
                                                      MPI processes<br>
                                                                  type:
                                                      sor<br>
                                                                    SOR:
                                                      type =
                                                      local_symmetric,
                                                      iterations = 1,
                                                      local<br>
                                                      iterations = 1,
                                                      omega = 1.<br>
                                                                  linear
                                                      system matrix =
                                                      precond matrix:<br>
                                                                  Mat
                                                      Object:           
                                                       1 MPI processes<br>
                                                                   
                                                      type: seqaij<br>
                                                                   
                                                      rows=30236,
                                                      cols=30236<br>
                                                                   
                                                      total:
                                                      nonzeros=2730644,
                                                      allocated
                                                      nonzeros=2730644<br>
                                                                   
                                                      total number of
                                                      mallocs used
                                                      during
                                                      MatSetValues calls
                                                      =0<br>
                                                                     
                                                      not using I-node
                                                      routines<br>
                                                              Up solver
                                                      (post-smoother)
                                                      same as down
                                                      solver
                                                      (pre-smoother)<br>
                                                              Down
                                                      solver
                                                      (pre-smoother) on
                                                      level 4<br>
-------------------------------<br>
                                                                KSP
                                                      Object:         
                                                      (fieldsplit_0_mg_levels_4_) 
                                                               1<br>
                                                      MPI processes<br>
                                                                  type:
                                                      richardson<br>
                                                                   
                                                      Richardson:
                                                      damping factor=1.<br>
                                                                 
                                                      maximum
                                                      iterations=2<br>
                                                                 
                                                      tolerances: 
                                                      relative=1e-05,
                                                      absolute=1e-50,
                                                      divergence=10000.<br>
                                                                  left
                                                      preconditioning<br>
                                                                  using
                                                      nonzero initial
                                                      guess<br>
                                                                  using
                                                      NONE norm type for
                                                      convergence test<br>
                                                                PC
                                                      Object:         
                                                      (fieldsplit_0_mg_levels_4_) 
                                                               1<br>
                                                      MPI processes<br>
                                                                  type:
                                                      sor<br>
                                                                    SOR:
                                                      type =
                                                      local_symmetric,
                                                      iterations = 1,
                                                      local<br>
                                                      iterations = 1,
                                                      omega = 1.<br>
                                                                  linear
                                                      system matrix =
                                                      precond matrix:<br>
                                                                  Mat
                                                      Object:           
                                                      (fieldsplit_0_)   
                                                               1 MPI<br>
                                                      processes<br>
                                                                   
                                                      type: seqaij<br>
                                                                   
                                                      rows=894132,
                                                      cols=894132<br>
                                                                   
                                                      total:
                                                      nonzeros=70684164,
                                                      allocated
                                                      nonzeros=70684164<br>
                                                                   
                                                      total number of
                                                      mallocs used
                                                      during
                                                      MatSetValues calls
                                                      =0<br>
                                                                     
                                                      not using I-node
                                                      routines<br>
                                                              Up solver
                                                      (post-smoother)
                                                      same as down
                                                      solver
                                                      (pre-smoother)<br>
                                                              linear
                                                      system matrix =
                                                      precond matrix:<br>
                                                              Mat
                                                      Object:       
                                                      (fieldsplit_0_)   
                                                           1 MPI
                                                      processes<br>
                                                                type:
                                                      seqaij<br>
                                                               
                                                      rows=894132,
                                                      cols=894132<br>
                                                                total:
                                                      nonzeros=70684164,
                                                      allocated
                                                      nonzeros=70684164<br>
                                                                total
                                                      number of mallocs
                                                      used during
                                                      MatSetValues calls
                                                      =0<br>
                                                                  not
                                                      using I-node
                                                      routines<br>
                                                          KSP solver for
                                                      S = A11 - A10
                                                      inv(A00) A01<br>
                                                            KSP Object: 
                                                         
                                                      (fieldsplit_1_)   
                                                         1 MPI processes<br>
                                                              type:
                                                      preonly<br>
                                                              maximum
                                                      iterations=10000,
                                                      initial guess is
                                                      zero<br>
                                                             
                                                      tolerances: 
                                                      relative=1e-05,
                                                      absolute=1e-50,
                                                      divergence=10000.<br>
                                                              left
                                                      preconditioning<br>
                                                              using NONE
                                                      norm type for
                                                      convergence test<br>
                                                            PC Object: 
                                                         
                                                      (fieldsplit_1_)   
                                                         1 MPI processes<br>
                                                              type:
                                                      jacobi<br>
                                                              linear
                                                      system matrix
                                                      followed by
                                                      preconditioner
                                                      matrix:<br>
                                                              Mat
                                                      Object:       
                                                      (fieldsplit_1_)   
                                                           1 MPI
                                                      processes<br>
                                                                type:
                                                      schurcomplement<br>
                                                               
                                                      rows=42025,
                                                      cols=42025<br>
                                                                  Schur
                                                      complement A11 -
                                                      A10 inv(A00) A01<br>
                                                                  A11<br>
                                                                    Mat
                                                      Object:           
                                                        (fieldsplit_1_) 
                                                                   1<br>
                                                      MPI processes<br>
                                                                     
                                                      type: seqaij<br>
                                                                     
                                                      rows=42025,
                                                      cols=42025<br>
                                                                     
                                                      total:
                                                      nonzeros=554063,
                                                      allocated
                                                      nonzeros=554063<br>
                                                                     
                                                      total number of
                                                      mallocs used
                                                      during
                                                      MatSetValues calls
                                                      =0<br>
                                                                       
                                                      not using I-node
                                                      routines<br>
                                                                  A10<br>
                                                                    Mat
                                                      Object:           
                                                         1 MPI processes<br>
                                                                     
                                                      type: seqaij<br>
                                                                     
                                                      rows=42025,
                                                      cols=894132<br>
                                                                     
                                                      total:
                                                      nonzeros=6850107,
                                                      allocated
                                                      nonzeros=6850107<br>
                                                                     
                                                      total number of
                                                      mallocs used
                                                      during
                                                      MatSetValues calls
                                                      =0<br>
                                                                       
                                                      not using I-node
                                                      routines<br>
                                                                  KSP of
                                                      A00<br>
                                                                    KSP
                                                      Object:           
                                                        (fieldsplit_0_) 
                                                                   1<br>
                                                      MPI processes<br>
                                                                     
                                                      type: preonly<br>
                                                                     
                                                      maximum
                                                      iterations=10000,
                                                      initial guess is
                                                      zero<br>
                                                                     
                                                      tolerances: 
                                                      relative=1e-05,
                                                      absolute=1e-50,<br>
                                                      divergence=10000.<br>
                                                                     
                                                      left
                                                      preconditioning<br>
                                                                     
                                                      using NONE norm
                                                      type for
                                                      convergence test<br>
                                                                    PC
                                                      Object:           
                                                        (fieldsplit_0_) 
                                                                   1<br>
                                                      MPI processes<br>
                                                                     
                                                      type: ml<br>
                                                                       
                                                      MG: type is
                                                      MULTIPLICATIVE,
                                                      levels=5 cycles=v<br>
                                                                       
                                                        Cycles per
                                                      PCApply=1<br>
                                                                       
                                                        Using Galerkin
                                                      computed coarse
                                                      grid matrices<br>
                                                                     
                                                      Coarse grid solver
                                                      -- level
                                                      -------------------------------<br>
                                                                       
                                                      KSP Object:<br>
(fieldsplit_0_mg_coarse_)                   1 MPI processes<br>
                                                                       
                                                        type: preonly<br>
                                                                       
                                                        maximum
                                                      iterations=10000,
                                                      initial guess is
                                                      zero<br>
                                                                       
                                                        tolerances: 
                                                      relative=1e-05,
                                                      absolute=1e-50,<br>
                                                      divergence=10000.<br>
                                                                       
                                                        left
                                                      preconditioning<br>
                                                                       
                                                        using NONE norm
                                                      type for
                                                      convergence test<br>
                                                                       
                                                      PC Object:<br>
(fieldsplit_0_mg_coarse_)                   1 MPI processes<br>
                                                                       
                                                        type: lu<br>
                                                                       
                                                          LU:
                                                      out-of-place
                                                      factorization<br>
                                                                       
                                                          tolerance for
                                                      zero pivot
                                                      2.22045e-14<br>
                                                                       
                                                          using diagonal
                                                      shift on blocks to
                                                      prevent zero<br>
                                                      pivot [INBLOCKS]<br>
                                                                       
                                                          matrix
                                                      ordering: nd<br>
                                                                       
                                                          factor fill
                                                      ratio given 5.,
                                                      needed 1.<br>
                                                                       
                                                            Factored
                                                      matrix follows:<br>
                                                                       
                                                              Mat
                                                      Object:           
                                                                     1
                                                      MPI<br>
                                                      processes<br>
                                                                       
                                                                type:
                                                      seqaij<br>
                                                                       
                                                                rows=3,
                                                      cols=3<br>
                                                                       
                                                                package
                                                      used to perform
                                                      factorization:
                                                      petsc<br>
                                                                       
                                                                total:
                                                      nonzeros=3,
                                                      allocated
                                                      nonzeros=3<br>
                                                                       
                                                                total
                                                      number of mallocs
                                                      used during<br>
                                                      MatSetValues calls
                                                      =0<br>
                                                                       
                                                                  not
                                                      using I-node
                                                      routines<br>
                                                               </blockquote></div></div></div></blockquote></div></blockquote></div></div></div></blockquote></div></blockquote></div></div></div></blockquote></div></div></div></blockquote></div></div></div></blockquote></div></blockquote></div></div>