[petsc-users] GAMG advice
Mark Adams
mfadams at lbl.gov
Wed Nov 8 19:05:41 CST 2017
>
>
> Now I'd like to try GAMG instead of ML. However, I don't know how to set
> it up to get similar performance.
> The obvious/naive
>
> -pc_type gamg
> -pc_gamg_type agg
>
> # with and without
> -pc_gamg_threshold 0.03
> -pc_mg_levels 3
>
>
This looks fine. I would not set the number of levels but if it helps then
go for it.
> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> proc), for instance:
> np = 1:
> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>
> np = 8:
> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>
> A very high threshold seems to improve the GAMG PC, for instance with
> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> What else should I try?
>
Not sure. ML use the same algorithm as GAMG (so the threshold means the
same thing pretty much). ML is a good solver and the leader, Ray Tuminaro,
has had a lot of NS experience. But I'm not sure what the differences are
that are resulting in this performance.
* It looks like you are using sor for the coarse grid solver in gamg:
Coarse grid solver -- level -------------------------------
KSP Object: (mg_levels_0_) 1 MPI processes
type: preonly
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_0_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations =
You should/must use lu, like in ML. This will kill you.
* smoothed aggregation vs unsmoothed: GAMG's view data does not say if it
is smoothing. Damn, I need to fix that. For NS, you probably want
unsmoothed (-pc_gamg_agg_nsmooths 0). I'm not sure what the ML parameter is
for this nor do I know the default. It should make a noticable difference
(good or bad).
* Threshold for dropping small values from graph 0.75 -- this is crazy :)
This is all that I can think of now.
Mark
>
> I would very much appreciate any advice on configuring GAMG and
> differences w.r.t ML to be taken into account (not a multigrid expert
> though).
>
> Thanks, best wishes
> David
>
>
> ------
> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3
>
> KSP Object: 1 MPI processes
> type: fgmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=10000
> tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
> right preconditioning
> using nonzero initial guess
> using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
> type: gamg
> MG: type is MULTIPLICATIVE, levels=1 cycles=v
> Cycles per PCApply=1
> Using Galerkin computed coarse grid matrices
> GAMG specific options
> Threshold for dropping small values from graph 0.75
> AGG specific options
> Symmetric graph false
> Coarse grid solver -- level -------------------------------
> KSP Object: (mg_levels_0_) 1 MPI processes
> type: preonly
> maximum iterations=2, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (mg_levels_0_) 1 MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1, local iterations =
> 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=1745224, cols=1745224
> total: nonzeros=99452608, allocated nonzeros=99452608
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 1037847 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=1745224, cols=1745224
> total: nonzeros=99452608, allocated nonzeros=99452608
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 1037847 nodes, limit used is 5
>
>
> ------
> ksp_view for -pc_type ml:
>
> KSP Object: 8 MPI processes
> type: fgmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=10000
> tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
> right preconditioning
> using nonzero initial guess
> using UNPRECONDITIONED norm type for convergence test
> PC Object: 8 MPI processes
> type: ml
> MG: type is MULTIPLICATIVE, levels=3 cycles=v
> Cycles per PCApply=1
> Using Galerkin computed coarse grid matrices
> Coarse grid solver -- level -------------------------------
> KSP Object: (mg_coarse_) 8 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (mg_coarse_) 8 MPI processes
> type: redundant
> Redundant preconditioner: First (color=0) of 8 PCs follows
> KSP Object: (mg_coarse_redundant_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (mg_coarse_redundant_) 1 MPI processes
> type: lu
> LU: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
> matrix ordering: nd
> factor fill ratio given 5., needed 10.4795
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=6822, cols=6822
> package used to perform factorization: petsc
> total: nonzeros=9575688, allocated nonzeros=9575688
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=6822, cols=6822
> total: nonzeros=913758, allocated nonzeros=913758
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 8 MPI processes
> type: mpiaij
> rows=6822, cols=6822
> total: nonzeros=913758, allocated nonzeros=913758
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
> Down solver (pre-smoother) on level 1 -------------------------------
> KSP Object: (mg_levels_1_) 8 MPI processes
> type: richardson
> Richardson: damping factor=1.
> maximum iterations=2
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using nonzero initial guess
> using NONE norm type for convergence test
> PC Object: (mg_levels_1_) 8 MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1, local iterations =
> 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object: 8 MPI processes
> type: mpiaij
> rows=67087, cols=67087
> total: nonzeros=9722749, allocated nonzeros=9722749
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
> Up solver (post-smoother) same as down solver (pre-smoother)
> Down solver (pre-smoother) on level 2 -------------------------------
> KSP Object: (mg_levels_2_) 8 MPI processes
> type: richardson
> Richardson: damping factor=1.
> maximum iterations=2
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using nonzero initial guess
> using NONE norm type for convergence test
> PC Object: (mg_levels_2_) 8 MPI processes
> type: sor
> SOR: type = local_symmetric, iterations = 1, local iterations =
> 1, omega = 1.
> linear system matrix = precond matrix:
> Mat Object: 8 MPI processes
> type: mpiaij
> rows=1745224, cols=1745224
> total: nonzeros=99452608, allocated nonzeros=99452608
> total number of mallocs used during MatSetValues calls =0
> using I-node (on process 0) routines: found 126690 nodes,
> limit used is 5
> Up solver (post-smoother) same as down solver (pre-smoother)
> linear system matrix = precond matrix:
> Mat Object: 8 MPI processes
> type: mpiaij
> rows=1745224, cols=1745224
> total: nonzeros=99452608, allocated nonzeros=99452608
> total number of mallocs used during MatSetValues calls =0
> using I-node (on process 0) routines: found 126690 nodes, limit
> used is 5
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171108/3b9bc3b0/attachment-0001.html>
More information about the petsc-users
mailing list