[petsc-users] Poor multigrid convergence in parallel
Lawrence Mitchell
lawrence.mitchell at imperial.ac.uk
Mon Jul 21 07:11:48 CDT 2014
On 21 Jul 2014, at 12:52, Dave May <dave.mayhem23 at gmail.com> wrote:
>
> -pc_type mg -mg_levels_ksp_type richardson -mg_levels_pc_type jacobi -mg_levels_ksp_max_it 2
>
> then I get identical convergence in serial and parallel
>
>
> Good. That's the correct result.
>
> if, however, I run with
>
> -pc_type mg -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor -mg_levels_ksp_max_it 2
> (the default according to -ksp_view)
>
> then I get very differing convergence in serial and parallel as described.
>
>
> It's normal that the behaviour is different. The PETSc SOR implementation is not parallel. It only performs SOR on your local subdomain.
Sure, however, with only two subdomains, I was not expecting to see such poor behaviour.
Below I show output from a run on 1 process and then two (along with ksp_view) for the following options:
-pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -mg_levels_pc_type sor -ksp_monitor
On 1 process:
0 KSP Residual norm 5.865090856053e+02
1 KSP Residual norm 1.293159126247e+01
2 KSP Residual norm 5.181199296299e-01
3 KSP Residual norm 1.268870802643e-02
4 KSP Residual norm 5.116058930806e-04
5 KSP Residual norm 3.735036960550e-05
6 KSP Residual norm 1.755288530515e-06
KSP Object: 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=6, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=2 cycles=v
Cycles per PCApply=1
Not using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5, needed 3.17724
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=144, cols=144
package used to perform factorization: petsc
total: nonzeros=2904, allocated nonzeros=2904
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=144, cols=144
total: nonzeros=914, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 1 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.0999972, max = 1.09997
Chebyshev: estimated using: [0 0.1; 0 1.1]
KSP Object: (mg_levels_1_est_) 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=529, cols=529
total: nonzeros=3521, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
not using I-node routines
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=529, cols=529
total: nonzeros=3521, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=529, cols=529
total: nonzeros=3521, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
not using I-node routines
On 2:
0 KSP Residual norm 5.867749653193e+02
1 KSP Residual norm 1.353369658350e+01
2 KSP Residual norm 1.350163644248e+01
3 KSP Residual norm 1.007552895680e+01
4 KSP Residual norm 1.294191582208e+00
5 KSP Residual norm 9.409953768968e-01
6 KSP Residual norm 9.409360529590e-01
KSP Object: 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=6, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=2 cycles=v
Cycles per PCApply=1
Not using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 2 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 2 MPI processes
type: redundant
Redundant preconditioner: First (color=0) of 2 PCs follows
KSP Object: (mg_coarse_redundant_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_redundant_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5, needed 2.72494
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=144, cols=144
package used to perform factorization: petsc
total: nonzeros=2120, allocated nonzeros=2120
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=144, cols=144
total: nonzeros=778, allocated nonzeros=778
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpiaij
rows=144, cols=144
total: nonzeros=778, allocated nonzeros=914
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 2 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.099992, max = 1.09991
Chebyshev: estimated using: [0 0.1; 0 1.1]
KSP Object: (mg_levels_1_est_) 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 2 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpiaij
rows=529, cols=529
total: nonzeros=3253, allocated nonzeros=3521
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 2 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpiaij
rows=529, cols=529
total: nonzeros=3253, allocated nonzeros=3521
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpiaij
rows=529, cols=529
total: nonzeros=3253, allocated nonzeros=3521
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
So notice that in the parallel case the residual reduction was ~10^3, rather than ~10^8 for the serial case.
> I see that this is a nested Krylov solve. Using fgmres on the outer sometimes is not enough. I've had problems where I needed to use the more stable orthogonalization routine in gmres.
>
> Do you also observe different convergence behaviour (serial versus parallel) with these choices
> 1) -mg_coarse_ksp_type gmres -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 1
Full options are (in addition to the above):
-ksp_type fgmres -pc_mg_levels 2 -ksp_monitor -ksp_max_it 6 -ksp_rtol 1e-8 -pc_type mg
1 process:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.294103921871e+01
2 KSP Residual norm 4.325949294172e+00
3 KSP Residual norm 1.373260455913e+00
4 KSP Residual norm 1.612639229769e-01
5 KSP Residual norm 1.896600662807e-02
6 KSP Residual norm 5.900847991084e-03
2 processes:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.242896923248e+01
2 KSP Residual norm 1.092088559774e+01
3 KSP Residual norm 7.383276000966e+00
4 KSP Residual norm 5.634790202135e+00
5 KSP Residual norm 4.329897745238e+00
6 KSP Residual norm 3.754170628391e+00
> 2) -mg_coarse_ksp_type gmres -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100 -mg_coarse_ksp_gmres_modifiedgramschmidt
1 process:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.030455192067e+01
2 KSP Residual norm 4.628068378242e-01
3 KSP Residual norm 1.965313019262e-02
4 KSP Residual norm 1.204109484597e-03
5 KSP Residual norm 5.812650812813e-05
6 KSP Residual norm 3.161780444565e-06
2 processes:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.324768309183e+01
2 KSP Residual norm 1.225921121405e+01
3 KSP Residual norm 1.173286143250e+01
4 KSP Residual norm 7.033886488294e+00
5 KSP Residual norm 4.825036058054e+00
6 KSP Residual norm 4.265434976636e+00
> 3) -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100
1 process:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.030455192067e+01
2 KSP Residual norm 4.628068378242e-01
3 KSP Residual norm 1.965313019262e-02
4 KSP Residual norm 1.204109484597e-03
5 KSP Residual norm 5.812650812814e-05
6 KSP Residual norm 3.161780444567e-06
2 processes:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.324768309183e+01
2 KSP Residual norm 1.225921121405e+01
3 KSP Residual norm 1.173286143250e+01
4 KSP Residual norm 7.033886488294e+00
5 KSP Residual norm 4.825036058053e+00
6 KSP Residual norm 4.265434976635e+00
> Sure - this wasn't a convergence test. I just wanted to see that the methods which should be identical in serial and parallel are in fact behaving as expected. Seems there are. So I'm included to think the problem is associated with having nested Krylov solves.
My observation appears to be that if I use unpreconditioned chebyshev as a smoother, then convergence in serial and parallel is identical and good. As soon as I turn on SOR preconditioning for the smoother, the parallel convergence falls to pieces (and the preconditioner becomes indefinite):
e.g. with
-pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -ksp_monitor -ksp_type fgmres -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100 -mg_levels_pc_type none
1 process:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.530397174638e+01
2 KSP Residual norm 1.027554200472e+00
3 KSP Residual norm 3.809236982955e-02
4 KSP Residual norm 2.445633720099e-03
5 KSP Residual norm 1.192136916270e-04
6 KSP Residual norm 7.067629143105e-06
2 processes:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.530397174638e+01
2 KSP Residual norm 1.027554200472e+00
3 KSP Residual norm 3.809236982955e-02
4 KSP Residual norm 2.445633720099e-03
5 KSP Residual norm 1.192136916270e-04
6 KSP Residual norm 7.067629143079e-06
with sor as a preconditioner:
-pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -ksp_monitor -ksp_type fgmres -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100 -mg_levels_pc_type sor
1 process:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.030455192067e+01
2 KSP Residual norm 4.628068378242e-01
3 KSP Residual norm 1.965313019262e-02
4 KSP Residual norm 1.204109484597e-03
5 KSP Residual norm 5.812650812814e-05
6 KSP Residual norm 3.161780444567e-06
2 processes:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.324768309183e+01
2 KSP Residual norm 1.225921121405e+01
3 KSP Residual norm 1.173286143250e+01
4 KSP Residual norm 7.033886488294e+00
5 KSP Residual norm 4.825036058053e+00
6 KSP Residual norm 4.265434976635e+00
Maybe it's just that I shouldn't be expecting this to work, but it seems odd to me.
Cheers,
Lawrence
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140721/104c49db/attachment.pgp>
More information about the petsc-users
mailing list