[petsc-users] GAMG advice
David Nolte
dnolte at dim.uchile.cl
Thu Nov 9 13:19:38 CST 2017
Hi Mark,
thanks for clarifying.
When I wrote the initial question I had somehow overlooked the fact that
the GAMG standard smoother was Chebychev while ML uses SOR. All the
other comments concerning threshold etc were based on this mistake.
The following settings work quite well, of course LU is used on the
coarse level.
-pc_type gamg
-pc_gamg_type agg
-pc_gamg_threshold 0.03
-pc_gamg_square_graph 10 # no effect ?
-pc_gamg_sym_graph
-mg_levels_ksp_type richardson
-mg_levels_pc_type sor
-pc_gamg_agg_nsmooths 0 does not seem to improve the convergence.
The ksp view now looks like this: (does this seem reasonable?)
KSP Object: 4 MPI processes
type: fgmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10000
tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
right preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: gamg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
GAMG specific options
Threshold for dropping small values from graph 0.03
AGG specific options
Symmetric graph true
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 4 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 4 MPI processes
type: bjacobi
block Jacobi: number of blocks = 4
Local solve is same for all blocks, in the following KSP and PC
objects:
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=38, cols=38
package used to perform factorization: petsc
total: nonzeros=1444, allocated nonzeros=1444
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 8 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=38, cols=38
total: nonzeros=1444, allocated nonzeros=1444
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 8 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=38, cols=38
total: nonzeros=1444, allocated nonzeros=1444
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 8 nodes, limit
used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 4 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 4 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=168, cols=168
total: nonzeros=19874, allocated nonzeros=19874
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 17 nodes, limit
used is 5
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 4 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 4 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=3572, cols=3572
total: nonzeros=963872, allocated nonzeros=963872
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 4 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 4 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=21179, cols=21179
total: nonzeros=1060605, allocated nonzeros=1060605
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 4 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 4 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=1745224, cols=1745224
total: nonzeros=99452608, allocated nonzeros=99452608
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 254433 nodes,
limit used is 5
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=1745224, cols=1745224
total: nonzeros=99452608, allocated nonzeros=99452608
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 254433 nodes, limit
used is 5
Thanks, David
On 11/08/2017 10:11 PM, Mark Adams wrote:
>
>
> On Wed, Nov 1, 2017 at 5:45 PM, David Nolte <dnolte at dim.uchile.cl
> <mailto:dnolte at dim.uchile.cl>> wrote:
>
> Thanks Barry.
> By simply replacing chebychev by richardson I get similar performance
> with GAMG and ML
>
>
> That too (I assumed you were using the same, I could not see cheby in
> your view data).
>
> I guess SOR works for the coarse grid solver because the coarse grid
> is small. It should help using lu.
>
>
> (GAMG even slightly faster):
>
>
> This is "random" fluctuations.
>
>
>
> -pc_type
> gamg
>
> -pc_gamg_type
> agg
>
> -pc_gamg_threshold
> 0.03
>
> -pc_gamg_square_graph 10
> -pc_gamg_sym_graph
> -mg_levels_ksp_type
> richardson
>
> -mg_levels_pc_type sor
>
> Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix
> is asymmetric?
>
>
> yes,
>
>
> For serial runs it doesn't seem to matter,
>
>
> yes,
>
>
> but in
> parallel the PC setup hangs (after calls of
> PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.
>
>
> yep,
>
>
>
> David
>
>
> On 10/21/2017 12:10 AM, Barry Smith wrote:
> > David,
> >
> > GAMG picks the number of levels based on how the coarsening
> process etc proceeds. You cannot hardwire it to a particular
> value. You can run with -info to get more info potentially on the
> decisions GAMG is making.
> >
> > Barry
> >
> >> On Oct 20, 2017, at 2:06 PM, David Nolte <dnolte at dim.uchile.cl
> <mailto:dnolte at dim.uchile.cl>> wrote:
> >>
> >> PS: I didn't realize at first, it looks as if the -pc_mg_levels
> 3 option
> >> was not taken into account:
> >> type: gamg
> >> MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >>
> >>
> >>
> >> On 10/20/2017 03:32 PM, David Nolte wrote:
> >>> Dear all,
> >>>
> >>> I have some problems using GAMG as a preconditioner for (F)GMRES.
> >>> Background: I am solving the incompressible, unsteady
> Navier-Stokes
> >>> equations with a coupled mixed FEM approach, using P1/P1
> elements for
> >>> velocity and pressure on an unstructured tetrahedron mesh with
> about
> >>> 2mio DOFs (and up to 15mio). The method is stabilized with
> SUPG/PSPG,
> >>> hence, no zeros on the diagonal of the pressure block. Time
> >>> discretization with semi-implicit backward Euler. The flow is a
> >>> convection dominated flow through a nozzle.
> >>>
> >>> So far, for this setup, I have been quite happy with a simple
> FGMRES/ML
> >>> solver for the full system (rather bruteforce, I admit, but
> much faster
> >>> than any block/Schur preconditioners I tried):
> >>>
> >>> -ksp_converged_reason
> >>> -ksp_monitor_true_residual
> >>> -ksp_type fgmres
> >>> -ksp_rtol 1.0e-6
> >>> -ksp_initial_guess_nonzero
> >>>
> >>> -pc_type ml
> >>> -pc_ml_Threshold 0.03
> >>> -pc_ml_maxNlevels 3
> >>>
> >>> This setup converges in ~100 iterations (see below the
> ksp_view output)
> >>> to rtol:
> >>>
> >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true
> resid norm
> >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
> >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true
> resid norm
> >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
> >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true
> resid norm
> >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
> >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true
> resid norm
> >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
> >>>
> >>>
> >>> Now I'd like to try GAMG instead of ML. However, I don't know
> how to set
> >>> it up to get similar performance.
> >>> The obvious/naive
> >>>
> >>> -pc_type gamg
> >>> -pc_gamg_type agg
> >>>
> >>> # with and without
> >>> -pc_gamg_threshold 0.03
> >>> -pc_mg_levels 3
> >>>
> >>> converges very slowly on 1 proc and much worse on 8 (~200k
> dofs per
> >>> proc), for instance:
> >>> np = 1:
> >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true
> resid norm
> >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true
> resid norm
> >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true
> resid norm
> >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
> >>>
> >>> np = 8:
> >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true
> resid norm
> >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true
> resid norm
> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true
> resid norm
> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>>
> >>> A very high threshold seems to improve the GAMG PC, for
> instance with
> >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> >>> What else should I try?
> >>>
> >>> I would very much appreciate any advice on configuring GAMG and
> >>> differences w.r.t ML to be taken into account (not a multigrid
> expert
> >>> though).
> >>>
> >>> Thanks, best wishes
> >>> David
> >>>
> >>>
> >>> ------
> >>> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75
> -pc_mg_levels 3
> >>>
> >>> KSP Object: 1 MPI processes
> >>> type: fgmres
> >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> >>> Orthogonalization with no iterative refinement
> >>> GMRES: happy breakdown tolerance 1e-30
> >>> maximum iterations=10000
> >>> tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
> >>> right preconditioning
> >>> using nonzero initial guess
> >>> using UNPRECONDITIONED norm type for convergence test
> >>> PC Object: 1 MPI processes
> >>> type: gamg
> >>> MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >>> Cycles per PCApply=1
> >>> Using Galerkin computed coarse grid matrices
> >>> GAMG specific options
> >>> Threshold for dropping small values from graph 0.75
> >>> AGG specific options
> >>> Symmetric graph false
> >>> Coarse grid solver -- level -------------------------------
> >>> KSP Object: (mg_levels_0_) 1 MPI processes
> >>> type: preonly
> >>> maximum iterations=2, initial guess is zero
> >>> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> >>> left preconditioning
> >>> using NONE norm type for convergence test
> >>> PC Object: (mg_levels_0_) 1 MPI processes
> >>> type: sor
> >>> SOR: type = local_symmetric, iterations = 1, local
> iterations =
> >>> 1, omega = 1.
> >>> linear system matrix = precond matrix:
> >>> Mat Object: 1 MPI processes
> >>> type: seqaij
> >>> rows=1745224, cols=1745224
> >>> total: nonzeros=99452608, allocated nonzeros=99452608
> >>> total number of mallocs used during MatSetValues calls =0
> >>> using I-node routines: found 1037847 nodes, limit
> used is 5
> >>> linear system matrix = precond matrix:
> >>> Mat Object: 1 MPI processes
> >>> type: seqaij
> >>> rows=1745224, cols=1745224
> >>> total: nonzeros=99452608, allocated nonzeros=99452608
> >>> total number of mallocs used during MatSetValues calls =0
> >>> using I-node routines: found 1037847 nodes, limit used is 5
> >>>
> >>>
> >>> ------
> >>> ksp_view for -pc_type ml:
> >>>
> >>> KSP Object: 8 MPI processes
> >>> type: fgmres
> >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> >>> Orthogonalization with no iterative refinement
> >>> GMRES: happy breakdown tolerance 1e-30
> >>> maximum iterations=10000
> >>> tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
> >>> right preconditioning
> >>> using nonzero initial guess
> >>> using UNPRECONDITIONED norm type for convergence test
> >>> PC Object: 8 MPI processes
> >>> type: ml
> >>> MG: type is MULTIPLICATIVE, levels=3 cycles=v
> >>> Cycles per PCApply=1
> >>> Using Galerkin computed coarse grid matrices
> >>> Coarse grid solver -- level -------------------------------
> >>> KSP Object: (mg_coarse_) 8 MPI processes
> >>> type: preonly
> >>> maximum iterations=10000, initial guess is zero
> >>> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> >>> left preconditioning
> >>> using NONE norm type for convergence test
> >>> PC Object: (mg_coarse_) 8 MPI processes
> >>> type: redundant
> >>> Redundant preconditioner: First (color=0) of 8 PCs follows
> >>> KSP Object: (mg_coarse_redundant_) 1
> MPI processes
> >>> type: preonly
> >>> maximum iterations=10000, initial guess is zero
> >>> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> >>> left preconditioning
> >>> using NONE norm type for convergence test
> >>> PC Object: (mg_coarse_redundant_) 1 MPI
> processes
> >>> type: lu
> >>> LU: out-of-place factorization
> >>> tolerance for zero pivot 2.22045e-14
> >>> using diagonal shift on blocks to prevent zero
> pivot [INBLOCKS]
> >>> matrix ordering: nd
> >>> factor fill ratio given 5., needed 10.4795
> >>> Factored matrix follows:
> >>> Mat Object: 1 MPI processes
> >>> type: seqaij
> >>> rows=6822, cols=6822
> >>> package used to perform factorization: petsc
> >>> total: nonzeros=9575688, allocated
> nonzeros=9575688
> >>> total number of mallocs used during
> MatSetValues calls =0
> >>> not using I-node routines
> >>> linear system matrix = precond matrix:
> >>> Mat Object: 1 MPI processes
> >>> type: seqaij
> >>> rows=6822, cols=6822
> >>> total: nonzeros=913758, allocated nonzeros=913758
> >>> total number of mallocs used during MatSetValues
> calls =0
> >>> not using I-node routines
> >>> linear system matrix = precond matrix:
> >>> Mat Object: 8 MPI processes
> >>> type: mpiaij
> >>> rows=6822, cols=6822
> >>> total: nonzeros=913758, allocated nonzeros=913758
> >>> total number of mallocs used during MatSetValues calls =0
> >>> not using I-node (on process 0) routines
> >>> Down solver (pre-smoother) on level 1
> -------------------------------
> >>> KSP Object: (mg_levels_1_) 8 MPI processes
> >>> type: richardson
> >>> Richardson: damping factor=1.
> >>> maximum iterations=2
> >>> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> >>> left preconditioning
> >>> using nonzero initial guess
> >>> using NONE norm type for convergence test
> >>> PC Object: (mg_levels_1_) 8 MPI processes
> >>> type: sor
> >>> SOR: type = local_symmetric, iterations = 1, local
> iterations =
> >>> 1, omega = 1.
> >>> linear system matrix = precond matrix:
> >>> Mat Object: 8 MPI processes
> >>> type: mpiaij
> >>> rows=67087, cols=67087
> >>> total: nonzeros=9722749, allocated nonzeros=9722749
> >>> total number of mallocs used during MatSetValues calls =0
> >>> not using I-node (on process 0) routines
> >>> Up solver (post-smoother) same as down solver (pre-smoother)
> >>> Down solver (pre-smoother) on level 2
> -------------------------------
> >>> KSP Object: (mg_levels_2_) 8 MPI processes
> >>> type: richardson
> >>> Richardson: damping factor=1.
> >>> maximum iterations=2
> >>> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> >>> left preconditioning
> >>> using nonzero initial guess
> >>> using NONE norm type for convergence test
> >>> PC Object: (mg_levels_2_) 8 MPI processes
> >>> type: sor
> >>> SOR: type = local_symmetric, iterations = 1, local
> iterations =
> >>> 1, omega = 1.
> >>> linear system matrix = precond matrix:
> >>> Mat Object: 8 MPI processes
> >>> type: mpiaij
> >>> rows=1745224, cols=1745224
> >>> total: nonzeros=99452608, allocated nonzeros=99452608
> >>> total number of mallocs used during MatSetValues calls =0
> >>> using I-node (on process 0) routines: found 126690
> nodes,
> >>> limit used is 5
> >>> Up solver (post-smoother) same as down solver (pre-smoother)
> >>> linear system matrix = precond matrix:
> >>> Mat Object: 8 MPI processes
> >>> type: mpiaij
> >>> rows=1745224, cols=1745224
> >>> total: nonzeros=99452608, allocated nonzeros=99452608
> >>> total number of mallocs used during MatSetValues calls =0
> >>> using I-node (on process 0) routines: found 126690
> nodes, limit
> >>> used is 5
> >>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171109/a6180cb1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171109/a6180cb1/attachment-0001.sig>
More information about the petsc-users
mailing list