[petsc-users] GAMG parallel convergence sensitivity
Mark Lohry
mlohry at gmail.com
Tue Mar 12 16:55:17 CDT 2019
Hi all, I've run into an unexpected issue with GAMG stagnating for a
certain condition. I'm running a 3D high order DG discretization for
compressible navier-stokes, using matrix-free gmres+amg, with the relevant
petsc configuration:
-pc_type gamg
-ksp_type fgmres
-pc_gamg_agg_nsmooths 0
-mg_levels_ksp_type gmres
-mg_levels_pc_type bjacobi
-mg_levels_ksp_max_it 20
-mg_levels_ksp_rtol 0.0001
-pc_mg_cycle_type v
-pc_mg_type full
So FGMRES on top, with AMG using ILU block jacobi + GMRES as a smoother.
-ksp_view output pasted at the bottom here. This setup has been working
fairly robustly.
I'm testing two small mesh resolutions, with 1,536 cells and 6,144 cells
each, where in the jacobian each cell is a 50x50 dense block, with 4
off-diagonal block neighbors each. With that, I'm testing 2 configurations
of the same problem, one with mach 0.1 and the other with mach 0.01 (where
the latter makes system much worse conditioned, a kind of stress test.)
In serial everything converges well to relative tolerance 0.01:
1,536 cells, Mach 0.1: 2 iterations
6,144 cells, Mach 0.1: 2 iterations
1,536 cells, Mach 0.01: 5 iterations
6,144 cells, Mach 0.01: 5 iterations
In parallel most things converge well, with -np 16 cores here:
1,536 cells, Mach 0.1: 3 iterations
6,144 cells, Mach 0.1: 4 iterations
1,536 cells, Mach 0.01: 11 iterations
but for the 6,144 cell Mach 0.01 case, it's catastrophically worse:
0 SNES Function norm 6.934657276072e+05
0 KSP Residual norm 6.934657276072e+05
1 KSP Residual norm 6.934440650708e+05
2 KSP Residual norm 6.934157525695e+05
3 KSP Residual norm 6.934145135179e+05
...
48 KSP Residual norm 6.830785654915e+05
49 KSP Residual norm 6.821332742917e+05
50 KSP Residual norm 6.807807049444e+05
and quickly stalls entirely and won't converge in 100s of iterations. The
exact same case in serial shows nice convergence:
0 SNES Function norm 6.934657276072e+05
0 KSP Residual norm 6.934657276072e+05
1 KSP Residual norm 1.705989154365e+05
2 KSP Residual norm 3.183292610749e+04
3 KSP Residual norm 1.568738082749e+04
4 KSP Residual norm 9.875297457387e+03
5 KSP Residual norm 6.489083537720e+03
Linear solve converged due to CONVERGED_RTOL iterations 5
And the marginally coarser 1,536 cell case with the same physics is also
healthy with parallel -np 16:
0 SNES Function norm 2.400990060398e+05
0 KSP Residual norm 2.400990060398e+05
1 KSP Residual norm 2.391625967890e+05
2 KSP Residual norm 1.388195699805e+05
3 KSP Residual norm 3.072388366914e+04
4 KSP Residual norm 2.151010198865e+04
5 KSP Residual norm 1.305330349765e+04
6 KSP Residual norm 8.126579575968e+03
7 KSP Residual norm 6.186198840355e+03
8 KSP Residual norm 4.673764041449e+03
9 KSP Residual norm 3.332141521573e+03
10 KSP Residual norm 2.811481187948e+03
11 KSP Residual norm 2.189632613389e+03
Linear solve converged due to CONVERGED_RTOL iterations 11
Any thoughts here? Is there anything obviously wrong with my setup? Any way
to reduce the dependence of the convergence iterations on the parallelism?
-- obviously I expect the iteration count to be higher in parallel, but I
didn't expect such catastrophic failure.
Thanks as always,
Mark
-ksp_view:
0 TS dt 30. time 0.
0 SNES Function norm 2.856641938332e+04
0 KSP Residual norm 2.856641938332e+04
1 KSP Residual norm 1.562096645358e+03
2 KSP Residual norm 3.008746074553e+02
3 KSP Residual norm 1.463990835793e+02
Linear solve converged due to CONVERGED_RTOL iterations 3
KSP Object: 16 MPI processes
type: fgmres
restart=100, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=100, initial guess is zero
tolerances: relative=0.01, absolute=1e-06, divergence=10.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 16 MPI processes
type: gamg
type is FULL, levels=5 cycles=v
Using externally compute Galerkin coarse grid matrices
GAMG specific options
Threshold for dropping small values in graph on each level = 0.
0. 0.
Threshold scaling factor for each level not specified = 1.
AGG specific options
Symmetric graph false
Number of levels to square graph 1
Number smoothing steps 0
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 16 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 16 MPI processes
type: bjacobi
number of blocks = 16
Local solve is same for all blocks, in the following KSP and PC
objects:
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5., needed 1.10526
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=25, cols=25, bs=5
package used to perform factorization: petsc
total: nonzeros=525, allocated nonzeros=525
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 5 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=25, cols=25, bs=5
total: nonzeros=475, allocated nonzeros=475
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 5 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 16 MPI processes
type: mpiaij
rows=25, cols=25, bs=5
total: nonzeros=475, allocated nonzeros=475
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 5 nodes, limit used
is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 16 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=20, nonzero initial guess
tolerances: relative=0.0001, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 16 MPI processes
type: bjacobi
number of blocks = 16
Local solve is same for all blocks, in the following KSP and PC
objects:
KSP Object: (mg_levels_1_sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_sub_) 1 MPI processes
type: ilu
out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=75, cols=75, bs=5
package used to perform factorization: petsc
total: nonzeros=1925, allocated nonzeros=1925
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 15 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=75, cols=75, bs=5
total: nonzeros=1925, allocated nonzeros=1925
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 15 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 16 MPI processes
type: mpiaij
rows=75, cols=75, bs=5
total: nonzeros=1925, allocated nonzeros=1925
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 15 nodes, limit used
is 5
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 16 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=20, nonzero initial guess
tolerances: relative=0.0001, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 16 MPI processes
type: bjacobi
number of blocks = 16
Local solve is same for all blocks, in the following KSP and PC
objects:
KSP Object: (mg_levels_2_sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_sub_) 1 MPI processes
type: ilu
out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=35, cols=35, bs=5
package used to perform factorization: petsc
total: nonzeros=675, allocated nonzeros=675
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 7 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=35, cols=35, bs=5
total: nonzeros=675, allocated nonzeros=675
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 7 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 16 MPI processes
type: mpiaij
rows=305, cols=305, bs=5
total: nonzeros=8675, allocated nonzeros=8675
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 7 nodes, limit used
is 5
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 16 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=20, nonzero initial guess
tolerances: relative=0.0001, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 16 MPI processes
type: bjacobi
number of blocks = 16
Local solve is same for all blocks, in the following KSP and PC
objects:
KSP Object: (mg_levels_3_sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_sub_) 1 MPI processes
type: ilu
out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=50, cols=50, bs=5
package used to perform factorization: petsc
total: nonzeros=1050, allocated nonzeros=1050
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 10 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=50, cols=50, bs=5
total: nonzeros=1050, allocated nonzeros=1050
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 10 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 16 MPI processes
type: mpiaij
rows=1090, cols=1090, bs=5
total: nonzeros=32050, allocated nonzeros=32050
total number of mallocs used during MatSetValues calls =0
using nonscalable MatPtAP() implementation
using I-node (on process 0) routines: found 10 nodes, limit used
is 5
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 16 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=20, nonzero initial guess
tolerances: relative=0.0001, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 16 MPI processes
type: bjacobi
number of blocks = 16
Local solve is same for all blocks, in the following KSP and PC
objects:
KSP Object: (mg_levels_4_sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_4_sub_) 1 MPI processes
type: ilu
out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=4850, cols=4850, bs=5
package used to perform factorization: petsc
total: nonzeros=1117500, allocated nonzeros=1117500
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 970 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=4850, cols=4850, bs=5
total: nonzeros=1117500, allocated nonzeros=1117500
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 970 nodes, limit used is 5
linear system matrix followed by preconditioner matrix:
Mat Object: 16 MPI processes
type: mffd
rows=76800, cols=76800
Matrix-free approximation:
err=1.49012e-08 (relative error in function evaluation)
Using wp compute h routine
Does not compute normU
Mat Object: 16 MPI processes
type: mpiaij
rows=76800, cols=76800, bs=5
total: nonzeros=18880000, allocated nonzeros=18880000
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 970 nodes, limit used
is 5
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix followed by preconditioner matrix:
Mat Object: 16 MPI processes
type: mffd
rows=76800, cols=76800
Matrix-free approximation:
err=1.49012e-08 (relative error in function evaluation)
Using wp compute h routine
Does not compute normU
Mat Object: 16 MPI processes
type: mpiaij
rows=76800, cols=76800, bs=5
total: nonzeros=18880000, allocated nonzeros=18880000
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 970 nodes, limit used is 5
Line search: Using full step: fnorm 2.856641938332e+04 gnorm
3.868815397561e+03
1 SNES Function norm 3.868815397561e+03
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190312/d9d41b01/attachment-0001.html>
More information about the petsc-users
mailing list