<div dir="ltr">I've now got gamg running on matrix-free newton-krylov with the jacobian provided by coloring finite differences (thanks again for the help). 3D Poisson with 4th order DG or higher (35^2 blocks), gamg with default settings is giving textbook convergence, which is great. Of course coupled compressible navier-stokes is harder, and convergence is bad-to-nonexistent.<div><br></div><div><div><div><br></div><div>1) Doc says "Equations must be ordered in
“vertex-major” ordering"; in my discretization, each "node" has 5 coupled degrees of freedom (density, 3 x momentum, energy). I'm ordering my unknowns:</div><div><br></div><div>rho_i, rhou_i, rhov_i, rhow_i, Et_i, rho_i+1, rhou_i+1, ... e.g. row-major matrix order if you wrote the unknowns [{rho}, {rhou}, ... ].</div><div><br></div><div>and then setting block size to 5. Is that correct? I've also tried using the actual sparsity of the matrix which has larger dense blocks (e.g. [35x5]^2), but neither seemed to help.</div><div><br></div><div><br></div><div>2) With default settings, and with -pc_gamg_square_graph, pc_gamg_sym_graph, agg_nsmooths 0 mentioned in the manual, the eigenvalue estimates explode on the coarsest level, which I don't see with poisson:</div></div><div><br></div><div><div> Down solver (pre-smoother) on level 1 -------------------------------</div><div> KSP Object: (mg_levels_1_) 32 MPI processes</div><div> type: chebyshev</div><div> eigenvalue estimates used: min = 0.18994, max = 2.08935</div><div> eigenvalues estimate via gmres min 0.00933256, max 1.8994</div></div><div><div> Down solver (pre-smoother) on level 2 -------------------------------</div><div> KSP Object: (mg_levels_2_) 32 MPI processes</div><div> type: chebyshev</div><div> eigenvalue estimates used: min = 0.165969, max = 1.82566</div><div> eigenvalues estimate via gmres min 0.0290728, max 1.65969</div></div><div><div> Down solver (pre-smoother) on level 3 -------------------------------</div><div> KSP Object: (mg_levels_3_) 32 MPI processes</div><div> type: chebyshev</div><div> eigenvalue estimates used: min = 0.146479, max = 1.61126</div><div> eigenvalues estimate via gmres min 0.204673, max 1.46479</div></div><div><div> Down solver (pre-smoother) on level 4 -------------------------------</div><div> KSP Object: (mg_levels_4_) 32 MPI processes</div><div> type: chebyshev</div><div> eigenvalue estimates used: min = 6.81977e+09, max = 7.50175e+10</div><div> eigenvalues estimate via gmres min -2.76436e+12, max 6.81977e+10</div></div><div><br></div><div>What's happening here? (Full -ksp_view below)</div><div><br></div><div>3) I'm not very familiar with chebyshev smoothers, but they're only for SPD systems (?). Is this an inappropriate smoother for this problem?</div><div><br></div><div>4) With gmres, the preconditioned residual is ~10 orders larger than the true residual; and the preconditioned residual drops while the true residual rises. I'm assuming this means something very wrong?<br></div><div><br></div><div>5) -pc_type hyper -pc_hypre_type boomeramg also works perfectly for the poisson case, but hits NaN on the first cycle for NS.</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><div>KSP Object: 32 MPI processes</div><div> type: fgmres</div><div> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</div><div> happy breakdown tolerance 1e-30</div><div> maximum iterations=50, initial guess is zero</div><div> tolerances: relative=1e-10, absolute=1e-07, divergence=10.</div><div> right preconditioning</div><div> using UNPRECONDITIONED norm type for convergence test</div><div>PC Object: 32 MPI processes</div><div> type: gamg</div><div> type is MULTIPLICATIVE, levels=5 cycles=v</div><div> Cycles per PCApply=1</div><div> Using externally compute Galerkin coarse grid matrices</div><div> GAMG specific options</div><div> Threshold for dropping small values in graph on each level = 0.1 0.1 0.1 </div><div> Threshold scaling factor for each level not specified = 1.</div><div> AGG specific options</div><div> Symmetric graph true</div><div> Number of levels to square graph 10</div><div> Number smoothing steps 0</div><div> Coarse grid solver -- level -------------------------------</div><div> KSP Object: (mg_coarse_) 32 MPI processes</div><div> type: preonly</div><div> maximum iterations=10000, initial guess is zero</div><div> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.</div><div> left preconditioning</div><div> using NONE norm type for convergence test</div><div> PC Object: (mg_coarse_) 32 MPI processes</div><div> type: bjacobi</div><div> number of blocks = 32</div><div> Local solve is same for all blocks, in the following KSP and PC objects:</div><div> KSP Object: (mg_coarse_sub_) 1 MPI processes</div><div> type: preonly</div><div> maximum iterations=1, initial guess is zero</div><div> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.</div><div> left preconditioning</div><div> using NONE norm type for convergence test</div><div> PC Object: (mg_coarse_sub_) 1 MPI processes</div><div> type: lu</div><div> out-of-place factorization</div><div> tolerance for zero pivot 2.22045e-14</div><div> using diagonal shift on blocks to prevent zero pivot [INBLOCKS]</div><div> matrix ordering: nd</div><div> factor fill ratio given 5., needed 0.</div><div> Factored matrix follows:</div><div> Mat Object: 1 MPI processes</div><div> type: seqaij</div><div> rows=0, cols=0, bs=5</div><div> package used to perform factorization: petsc</div><div> total: nonzeros=1, allocated nonzeros=1</div><div> total number of mallocs used during MatSetValues calls =0</div><div> not using I-node routines</div><div> linear system matrix = precond matrix:</div><div> Mat Object: 1 MPI processes</div><div> type: seqaij</div><div> rows=0, cols=0, bs=5</div><div> total: nonzeros=0, allocated nonzeros=0</div><div> total number of mallocs used during MatSetValues calls =0</div><div> not using I-node routines</div><div> linear system matrix = precond matrix:</div><div> Mat Object: 32 MPI processes</div><div> type: mpiaij</div><div> rows=5, cols=5, bs=5</div><div> total: nonzeros=25, allocated nonzeros=25</div><div> total number of mallocs used during MatSetValues calls =0</div><div> not using I-node (on process 0) routines</div><div> Down solver (pre-smoother) on level 1 -------------------------------</div><div> KSP Object: (mg_levels_1_) 32 MPI processes</div><div> type: chebyshev</div><div> eigenvalue estimates used: min = 0.18994, max = 2.08935</div><div> eigenvalues estimate via gmres min 0.00933256, max 1.8994</div><div> eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]</div><div> KSP Object: (mg_levels_1_esteig_) 32 MPI processes</div><div> type: gmres</div><div> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</div><div> happy breakdown tolerance 1e-30</div><div> maximum iterations=10, initial guess is zero</div><div> tolerances: relative=1e-12, absolute=1e-50, divergence=10000.</div><div> left preconditioning</div><div> using PRECONDITIONED norm type for convergence test</div><div> estimating eigenvalues using noisy right hand side</div><div> maximum iterations=2, nonzero initial guess</div><div> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.</div><div> left preconditioning</div><div> using NONE norm type for convergence test</div><div> PC Object: (mg_levels_1_) 32 MPI processes</div><div> type: sor</div><div> type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div> linear system matrix = precond matrix:</div><div> Mat Object: 32 MPI processes</div><div> type: mpiaij</div><div> rows=70, cols=70, bs=5</div><div> total: nonzeros=1550, allocated nonzeros=1550</div><div> total number of mallocs used during MatSetValues calls =0</div><div> using nonscalable MatPtAP() implementation</div><div> not using I-node (on process 0) routines</div><div> Up solver (post-smoother) same as down solver (pre-smoother)</div><div> Down solver (pre-smoother) on level 2 -------------------------------</div><div> KSP Object: (mg_levels_2_) 32 MPI processes</div><div> type: chebyshev</div><div> eigenvalue estimates used: min = 0.165969, max = 1.82566</div><div> eigenvalues estimate via gmres min 0.0290728, max 1.65969</div><div> eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]</div><div> KSP Object: (mg_levels_2_esteig_) 32 MPI processes</div><div> type: gmres</div><div> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</div><div> happy breakdown tolerance 1e-30</div><div> maximum iterations=10, initial guess is zero</div><div> tolerances: relative=1e-12, absolute=1e-50, divergence=10000.</div><div> left preconditioning</div><div> using PRECONDITIONED norm type for convergence test</div><div> estimating eigenvalues using noisy right hand side</div><div> maximum iterations=2, nonzero initial guess</div><div> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.</div><div> left preconditioning</div><div> using NONE norm type for convergence test</div><div> PC Object: (mg_levels_2_) 32 MPI processes</div><div> type: sor</div><div> type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div> linear system matrix = precond matrix:</div><div> Mat Object: 32 MPI processes</div><div> type: mpiaij</div><div> rows=270, cols=270, bs=5</div><div> total: nonzeros=7550, allocated nonzeros=7550</div><div> total number of mallocs used during MatSetValues calls =0</div><div> using nonscalable MatPtAP() implementation</div><div> not using I-node (on process 0) routines</div><div> Up solver (post-smoother) same as down solver (pre-smoother)</div><div> Down solver (pre-smoother) on level 3 -------------------------------</div><div> KSP Object: (mg_levels_3_) 32 MPI processes</div><div> type: chebyshev</div><div> eigenvalue estimates used: min = 0.146479, max = 1.61126</div><div> eigenvalues estimate via gmres min 0.204673, max 1.46479</div><div> eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]</div><div> KSP Object: (mg_levels_3_esteig_) 32 MPI processes</div><div> type: gmres</div><div> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</div><div> happy breakdown tolerance 1e-30</div><div> maximum iterations=10, initial guess is zero</div><div> tolerances: relative=1e-12, absolute=1e-50, divergence=10000.</div><div> left preconditioning</div><div> using PRECONDITIONED norm type for convergence test</div><div> estimating eigenvalues using noisy right hand side</div><div> maximum iterations=2, nonzero initial guess</div><div> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.</div><div> left preconditioning</div><div> using NONE norm type for convergence test</div><div> PC Object: (mg_levels_3_) 32 MPI processes</div><div> type: sor</div><div> type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div> linear system matrix = precond matrix:</div><div> Mat Object: 32 MPI processes</div><div> type: mpiaij</div><div> rows=1610, cols=1610, bs=5</div><div> total: nonzeros=55550, allocated nonzeros=55550</div><div> total number of mallocs used during MatSetValues calls =0</div><div> using nonscalable MatPtAP() implementation</div><div> using I-node (on process 0) routines: found 6 nodes, limit used is 5</div><div> Up solver (post-smoother) same as down solver (pre-smoother)</div><div> Down solver (pre-smoother) on level 4 -------------------------------</div><div> KSP Object: (mg_levels_4_) 32 MPI processes</div><div> type: chebyshev</div><div> eigenvalue estimates used: min = 6.81977e+09, max = 7.50175e+10</div><div> eigenvalues estimate via gmres min -2.76436e+12, max 6.81977e+10</div><div> eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]</div><div> KSP Object: (mg_levels_4_esteig_) 32 MPI processes</div><div> type: gmres</div><div> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</div><div> happy breakdown tolerance 1e-30</div><div> maximum iterations=10, initial guess is zero</div><div> tolerances: relative=1e-12, absolute=1e-50, divergence=10000.</div><div> left preconditioning</div><div> using PRECONDITIONED norm type for convergence test</div><div> estimating eigenvalues using noisy right hand side</div><div> maximum iterations=2, nonzero initial guess</div><div> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.</div><div> left preconditioning</div><div> using NONE norm type for convergence test</div><div> PC Object: (mg_levels_4_) 32 MPI processes</div><div> type: sor</div><div> type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div> linear system matrix followed by preconditioner matrix:</div><div> Mat Object: 32 MPI processes</div><div> type: mffd</div><div> rows=153600, cols=153600</div><div> Matrix-free approximation:</div><div> err=1.49012e-08 (relative error in function evaluation)</div><div> Using wp compute h routine</div><div> Does not compute normU</div><div> Mat Object: 32 MPI processes</div><div> type: mpiaij</div><div> rows=153600, cols=153600, bs=5</div><div> total: nonzeros=65280000, allocated nonzeros=65280000</div><div> total number of mallocs used during MatSetValues calls =0</div><div> using I-node (on process 0) routines: found 960 nodes, limit used is 5</div><div> Up solver (post-smoother) same as down solver (pre-smoother)</div><div> linear system matrix followed by preconditioner matrix:</div><div> Mat Object: 32 MPI processes</div><div> type: mffd</div><div> rows=153600, cols=153600</div><div> Matrix-free approximation:</div><div> err=1.49012e-08 (relative error in function evaluation)</div><div> Using wp compute h routine</div><div> Does not compute normU</div><div> Mat Object: 32 MPI processes</div><div> type: mpiaij</div><div> rows=153600, cols=153600, bs=5</div><div> total: nonzeros=65280000, allocated nonzeros=65280000</div><div> total number of mallocs used during MatSetValues calls =0</div><div> using I-node (on process 0) routines: found 960 nodes, limit used is 5</div><div> 1 SNES Function norm 5.917486103148e+05 </div></div><div><br></div></div></div>