<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

Now I'd like to try GAMG instead of ML. However, I don't know how to set<br>

it up to get similar performance.<br>

The obvious/naive<br>

<br>

    -pc_type gamg<br>

    -pc_gamg_type agg<br>

<br>

# with and without<br>

    -pc_gamg_threshold 0.03<br>

    -pc_mg_levels 3<br>

<br></blockquote><div><br></div><div>This looks fine. I would not set the number of levels but if it helps then go for it.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

converges very slowly on 1 proc and much worse on 8 (~200k dofs per<br>

proc), for instance:<br>

np = 1:<br>

980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm<br>

1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04<br>

981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm<br>

1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04<br>

982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm<br>

1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04<br>

<br>

np = 8:<br>

980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm<br>

3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03<br>

981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm<br>

3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03<br>

982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm<br>

3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03<br>

<br>

A very high threshold seems to improve the GAMG PC, for instance with<br>

0.75 I get convergence to rtol=1e-6 after 744 iterations.<br>

What else should I try?<br></blockquote><div><br></div><div>Not sure. ML use the same algorithm as GAMG (so the threshold means the same thing pretty much). ML is a good solver and the leader, Ray Tuminaro, has had a lot of NS experience. But I'm not sure what the differences are that are resulting in this performance. </div><div><br></div><div>* It looks like you are using sor for the coarse grid solver in gamg:</div><div><br></div><div><span style="font-size:12.8px">  Coarse grid solver -- level ------------------------------</span><wbr style="font-size:12.8px"><span style="font-size:12.8px">-</span><br style="font-size:12.8px"><span style="font-size:12.8px">    KSP Object:    (mg_levels_0_)     1 MPI processes</span><br style="font-size:12.8px"><span style="font-size:12.8px">      type: preonly</span><br style="font-size:12.8px"><span style="font-size:12.8px">      maximum iterations=2, initial guess is zero</span><br style="font-size:12.8px"><span style="font-size:12.8px">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</span><br style="font-size:12.8px"><span style="font-size:12.8px">      left preconditioning</span><br style="font-size:12.8px"><span style="font-size:12.8px">      using NONE norm type for convergence test</span><br style="font-size:12.8px"><span style="font-size:12.8px">    PC Object:    (mg_levels_0_)     1 MPI processes</span><br style="font-size:12.8px"><span style="font-size:12.8px">      type: sor</span><br style="font-size:12.8px"><span style="font-size:12.8px">        SOR: type = local_symmetric, iterations = 1, local iterations =</span><br></div><div><br></div><div>You should/must use lu, like in ML. This will kill you.</div><div><br></div><div>* smoothed aggregation vs unsmoothed: GAMG's view data does not say if it is smoothing. Damn, I need to fix that. For NS, you probably want unsmoothed (-pc_gamg_agg_nsmooths 0). I'm not sure what the ML parameter is for this nor do I know the default. It should make a noticable difference (good or bad).</div><div><br></div><div>* Threshold for dropping small values from graph 0.75 -- this is crazy :)</div><div><br></div><div>This is all that I can think of now.</div><div><br></div><div>Mark</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

I would very much appreciate any advice on configuring GAMG and<br>

differences w.r.t ML to be taken into account (not a multigrid expert<br>

though).<br>

<br>

Thanks, best wishes<br>

David<br>

<br>

<br>

------<br>

ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75 -pc_mg_levels 3<br>

<br>

KSP Object: 1 MPI processes<br>

  type: fgmres<br>

    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt<br>

Orthogonalization with no iterative refinement<br>

    GMRES: happy breakdown tolerance 1e-30<br>

  maximum iterations=10000<br>

  tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.<br>

  right preconditioning<br>

  using nonzero initial guess<br>

  using UNPRECONDITIONED norm type for convergence test<br>

PC Object: 1 MPI processes<br>

  type: gamg<br>

    MG: type is MULTIPLICATIVE, levels=1 cycles=v<br>

      Cycles per PCApply=1<br>

      Using Galerkin computed coarse grid matrices<br>

      GAMG specific options<br>

        Threshold for dropping small values from graph 0.75<br>

        AGG specific options<br>

          Symmetric graph false<br>

  Coarse grid solver -- level ------------------------------<wbr>-<br>

    KSP Object:    (mg_levels_0_)     1 MPI processes<br>

      type: preonly<br>

      maximum iterations=2, initial guess is zero<br>

      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

      left preconditioning<br>

      using NONE norm type for convergence test<br>

    PC Object:    (mg_levels_0_)     1 MPI processes<br>

      type: sor<br>

        SOR: type = local_symmetric, iterations = 1, local iterations =<br>

1, omega = 1.<br>

      linear system matrix = precond matrix:<br>

      Mat Object:       1 MPI processes<br>

        type: seqaij<br>

        rows=1745224, cols=1745224<br>

        total: nonzeros=99452608, allocated nonzeros=99452608<br>

        total number of mallocs used during MatSetValues calls =0<br>

          using I-node routines: found 1037847 nodes, limit used is 5<br>

  linear system matrix = precond matrix:<br>

  Mat Object:   1 MPI processes<br>

    type: seqaij<br>

    rows=1745224, cols=1745224<br>

    total: nonzeros=99452608, allocated nonzeros=99452608<br>

    total number of mallocs used during MatSetValues calls =0<br>

      using I-node routines: found 1037847 nodes, limit used is 5<br>

<br>

<br>

------<br>

ksp_view for -pc_type ml:<br>

<br>

KSP Object: 8 MPI processes<br>

  type: fgmres<br>

    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt<br>

Orthogonalization with no iterative refinement<br>

    GMRES: happy breakdown tolerance 1e-30<br>

  maximum iterations=10000<br>

  tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.<br>

  right preconditioning<br>

  using nonzero initial guess<br>

  using UNPRECONDITIONED norm type for convergence test<br>

PC Object: 8 MPI processes<br>

  type: ml<br>

    MG: type is MULTIPLICATIVE, levels=3 cycles=v<br>

      Cycles per PCApply=1<br>

      Using Galerkin computed coarse grid matrices<br>

  Coarse grid solver -- level ------------------------------<wbr>-<br>

    KSP Object:    (mg_coarse_)     8 MPI processes<br>

      type: preonly<br>

      maximum iterations=10000, initial guess is zero<br>

      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

      left preconditioning<br>

      using NONE norm type for convergence test<br>

    PC Object:    (mg_coarse_)     8 MPI processes<br>

      type: redundant<br>

        Redundant preconditioner: First (color=0) of 8 PCs follows<br>

        KSP Object:        (mg_coarse_redundant_)         1 MPI processes<br>

          type: preonly<br>

          maximum iterations=10000, initial guess is zero<br>

          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

          left preconditioning<br>

          using NONE norm type for convergence test<br>

        PC Object:        (mg_coarse_redundant_)         1 MPI processes<br>

          type: lu<br>

            LU: out-of-place factorization<br>

            tolerance for zero pivot 2.22045e-14<br>

            using diagonal shift on blocks to prevent zero pivot [INBLOCKS]<br>

            matrix ordering: nd<br>

            factor fill ratio given 5., needed 10.4795<br>

              Factored matrix follows:<br>

                Mat Object:                 1 MPI processes<br>

                  type: seqaij<br>

                  rows=6822, cols=6822<br>

                  package used to perform factorization: petsc<br>

                  total: nonzeros=9575688, allocated nonzeros=9575688<br>

                  total number of mallocs used during MatSetValues calls =0<br>

                    not using I-node routines<br>

          linear system matrix = precond matrix:<br>

          Mat Object:           1 MPI processes<br>

            type: seqaij<br>

            rows=6822, cols=6822<br>

            total: nonzeros=913758, allocated nonzeros=913758<br>

            total number of mallocs used during MatSetValues calls =0<br>

              not using I-node routines<br>

      linear system matrix = precond matrix:<br>

      Mat Object:       8 MPI processes<br>

        type: mpiaij<br>

        rows=6822, cols=6822<br>

        total: nonzeros=913758, allocated nonzeros=913758<br>

        total number of mallocs used during MatSetValues calls =0<br>

          not using I-node (on process 0) routines<br>

  Down solver (pre-smoother) on level 1 ------------------------------<wbr>-<br>

    KSP Object:    (mg_levels_1_)     8 MPI processes<br>

      type: richardson<br>

        Richardson: damping factor=1.<br>

      maximum iterations=2<br>

      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

      left preconditioning<br>

      using nonzero initial guess<br>

      using NONE norm type for convergence test<br>

    PC Object:    (mg_levels_1_)     8 MPI processes<br>

      type: sor<br>

        SOR: type = local_symmetric, iterations = 1, local iterations =<br>

1, omega = 1.<br>

      linear system matrix = precond matrix:<br>

      Mat Object:       8 MPI processes<br>

        type: mpiaij<br>

        rows=67087, cols=67087<br>

        total: nonzeros=9722749, allocated nonzeros=9722749<br>

        total number of mallocs used during MatSetValues calls =0<br>

          not using I-node (on process 0) routines<br>

  Up solver (post-smoother) same as down solver (pre-smoother)<br>

  Down solver (pre-smoother) on level 2 ------------------------------<wbr>-<br>

    KSP Object:    (mg_levels_2_)     8 MPI processes<br>

      type: richardson<br>

        Richardson: damping factor=1.<br>

      maximum iterations=2<br>

      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

      left preconditioning<br>

      using nonzero initial guess<br>

      using NONE norm type for convergence test<br>

    PC Object:    (mg_levels_2_)     8 MPI processes<br>

      type: sor<br>

        SOR: type = local_symmetric, iterations = 1, local iterations =<br>

1, omega = 1.<br>

      linear system matrix = precond matrix:<br>

      Mat Object:       8 MPI processes<br>

        type: mpiaij<br>

        rows=1745224, cols=1745224<br>

        total: nonzeros=99452608, allocated nonzeros=99452608<br>

        total number of mallocs used during MatSetValues calls =0<br>

          using I-node (on process 0) routines: found 126690 nodes,<br>

limit used is 5<br>

  Up solver (post-smoother) same as down solver (pre-smoother)<br>

  linear system matrix = precond matrix:<br>

  Mat Object:   8 MPI processes<br>

    type: mpiaij<br>

    rows=1745224, cols=1745224<br>

    total: nonzeros=99452608, allocated nonzeros=99452608<br>

    total number of mallocs used during MatSetValues calls =0<br>

      using I-node (on process 0) routines: found 126690 nodes, limit<br>

used is 5<br>

<br>

</blockquote></div><br></div></div>