[petsc-users] [KSP] PETSc not reporting a KSP fail when true residual is NaN
Giovane Avancini
giavancini at usp.br
Fri Feb 25 10:06:01 CST 2022
Dear PETSc users,
I'm working on an inhouse code that solves the Navier-Stokes equation in a
Lagrangian fashion for free surface flows. Because of the large distortions
and pressure gradients, it is quite common to encounter some issues with
iterative solvers for some time steps, and because of that, I implemented a
function that changes the solver type based on the flag KSPConvergedReason.
If this flag is negative after a call to KSPSolve, I solve the same linear
system again using a direct method.
The problem is that, sometimes, KSP keeps converging even though the
residual is NaN, and because of that, I'm not able to identify the problem
and change the solver, which leads to a solution vector equals to INF and
obviously the code ends up crashing. Is it normal to observe this kind of
behaviour?
Please find attached the log produced with the options
-ksp_monitor_lg_residualnorm -ksp_log -ksp_view -ksp_monitor_true_residual
-ksp_converged_reason and the function that changes the solver. I'm
currently using FGMRES and BJACOBI preconditioner with LU for each block.
The problem still happens with ILU for example. We can see in the log file
that for the time step 921, the true residual is NaN and within just one
iteration, the solver fails and it gives the reason DIVERGED_PC_FAILED. I
simply changed the solver to MUMPS and it converged for that time step.
However, when solving time step 922 we can see that FGMRES converges while
the true residual is NaN. Why is that possible? I would appreciate it if
someone could clarify this issue to me.
Kind regards,
Giovane
--
Giovane Avancini
Doutorando em Engenharia de Estruturas - Escola de Engenharia de São
Carlos, USP
PhD researcher in Structural Engineering - School of Engineering of São
Carlos. USP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220225/722c8a3f/attachment-0001.html>
-------------- next part --------------
void FluidDomain::solveLinearSystem(KSP& ksp, Mat& mat, Vec& rhs, Vec& solution)
{
auto start_timer = std::chrono::high_resolution_clock::now();
KSPReset(ksp);
KSPSetOperators(ksp, mat, mat);
PC pc;
KSPGetPC(ksp, &pc);
PetscBool isbjacobi;
PetscObjectTypeCompare((PetscObject)pc, PCBJACOBI, &isbjacobi);
if (isbjacobi)
{
PetscInt nlocal;
KSP *subksp;
PC subpc;
KSPSetUp(ksp);
PCBJacobiGetSubKSP(pc, &nlocal, NULL, &subksp);
for (int i = 0; i < nlocal; i++)
{
KSPGetPC(subksp[i], &subpc);
PCSetType(subpc, PCLU);
// PCFactorReorderForNonzeroDiagonal(subpc, 1.0e-10);
// PCFactorSetShiftType(subpc, MAT_SHIFT_NONZERO);
//PCFactorSetShiftAmount(subpc, 1.0e-10);
}
}
else
{
//PCFactorReorderForNonzeroDiagonal(pc, 1.0e-10);
//PCFactorSetShiftType(pc, MAT_SHIFT_NONZERO);
// PCFactorSetShiftAmount(pc, 1.0e-10);
}
PetscReal matnorm, vecnorm;
VecNorm(rhs, NORM_INFINITY, &vecnorm);
MatNorm(mat, NORM_INFINITY, &matnorm);
PetscPrintf(PETSC_COMM_WORLD, "MatNorm: %g VecNorm: %g\n", (double)matnorm, (double)vecnorm);
KSPSolve(ksp, rhs, solution);
KSPView(ksp, PETSC_VIEWER_STDOUT_WORLD);
KSPConvergedReason reason;
KSPGetConvergedReason(ksp,&reason);
PetscInt nit;
KSPGetIterationNumber(ksp, &nit);
PetscReal norm;
KSPGetResidualNorm(ksp, &norm);
auto end_timer = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = end_timer - start_timer;
if (reason > 0)
{
PetscPrintf(PETSC_COMM_WORLD, "Solver converged within %d iterations. Elapsed time: %f\n", nit, elapsed.count());
}
else
{
if (reason == -3)
PetscPrintf(PETSC_COMM_WORLD, "Solver convergence is very slow. Modifying the solver in order to improve the convergence...\n");
else
PetscPrintf(PETSC_COMM_WORLD, "Solver diverged, reason %d. Modifying the solver in order to improve the convergence...\n", reason);
KSP ksp2;
KSPCreate(PETSC_COMM_WORLD, &ksp2);
KSPSetType(ksp2, KSPPREONLY);
KSPSetTolerances(ksp2, 1.0e-8, PETSC_DEFAULT, PETSC_DEFAULT, 5000);
KSPGMRESSetRestart(ksp2, 30);
PC pc2;
KSPGetPC(ksp2, &pc2);
PCSetType(pc2, PCLU);
KSPSetOperators(ksp2, mat, mat);
PetscObjectTypeCompare((PetscObject)pc2, PCBJACOBI, &isbjacobi);
if (isbjacobi)
{
PetscInt nlocal;
KSP *subksp;
PC subpc;
KSPSetUp(ksp2);
PCBJacobiGetSubKSP(pc2, &nlocal, NULL, &subksp);
for (int i = 0; i < nlocal; i++)
{
KSPGetPC(subksp[i], &subpc);
//PCFactorSetShiftType(subpc, MAT_SHIFT_NONZERO);
}
}
else
{
//PCFactorSetShiftType(pc2, MAT_SHIFT_NONZERO);
}
VecNorm(rhs, NORM_INFINITY, &vecnorm);
MatNorm(mat, NORM_INFINITY, &matnorm);
PetscPrintf(PETSC_COMM_WORLD, "MatNorm: %g VecNorm: %g\n", (double)matnorm, (double)vecnorm);
KSPSolve(ksp2, rhs, solution);
KSPGetConvergedReason(ksp2, &reason);
KSPGetIterationNumber(ksp2, &nit);
if (reason > 0)
{
PetscPrintf(PETSC_COMM_WORLD, "Solver converged within %d iterations.\n", nit);
}
else
{
PetscPrintf(PETSC_COMM_WORLD, "Changing the solver did not improve the convergence.\n");
}
KSPDestroy(&ksp2);
}
}
-------------- next part --------------
----------------------- TIME STEP = 921, time = 0.184200 -----------------------
Mesh Regenerated. Elapsed time: 0.011534
Isolated nodes: 0
Assemble Linear System. Elapsed time: 0.023297
MatNorm: 3.04644e+06 VecNorm: 1305.
0 KSP unpreconditioned resid norm 1.466259843490e+04 true resid norm -nan ||r(i)||/||b|| -nan
Linear solve did not converge due to DIVERGED_PC_FAILED iterations 0
PC failed due to SUBPC_ERROR
KSP Object: 4 MPI processes
type: fgmres
restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=500, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: bjacobi
number of blocks = 4
Local solver information for first block is in the following KSP and PC objects on rank 0:
Use -ksp_view ::ascii_info_detail to display information for all blocks
KSP Object: (sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (sub_) 1 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: nd
factor fill ratio given 5., needed 2.90995
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=1091, cols=1091
package used to perform factorization: petsc
total: nonzeros=58039, allocated nonzeros=58039
using I-node routines: found 364 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: (sub_) 1 MPI processes
type: seqaij
rows=1091, cols=1091
total: nonzeros=19945, allocated nonzeros=19945
total number of mallocs used during MatSetValues calls=0
using I-node routines: found 364 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=4362, cols=4362
total: nonzeros=88470, allocated nonzeros=88470
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 364 nodes, limit used is 5
Solver diverged, reason -11. Modifying the solver in order to improve the convergence...
MatNorm: 3.04644e+06 VecNorm: 1305.
Linear solve converged due to CONVERGED_ITS iterations 1
Solver converged within 1 iterations.
Newton iteration: 0 - L2 Position Norm: 1.203626E-03 - L2 Pressure Norm: 2.537266E-01
Memory used by each processor: 36.636719 Mb
Assemble Linear System. Elapsed time: 0.016010
MatNorm: 3.04644e+06 VecNorm: 0.0239994
0 KSP unpreconditioned resid norm 6.218477255232e-02 true resid norm -nan ||r(i)||/||b|| -nan
Linear solve did not converge due to DIVERGED_PC_FAILED iterations 0
PC failed due to SUBPC_ERROR
KSP Object: 4 MPI processes
type: fgmres
restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=500, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: bjacobi
number of blocks = 4
Local solver information for first block is in the following KSP and PC objects on rank 0:
Use -ksp_view ::ascii_info_detail to display information for all blocks
KSP Object: (sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (sub_) 1 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: nd
factor fill ratio given 5., needed 2.90995
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=1091, cols=1091
package used to perform factorization: petsc
total: nonzeros=58039, allocated nonzeros=58039
using I-node routines: found 364 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: (sub_) 1 MPI processes
type: seqaij
rows=1091, cols=1091
total: nonzeros=19945, allocated nonzeros=19945
total number of mallocs used during MatSetValues calls=0
using I-node routines: found 364 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=4362, cols=4362
total: nonzeros=88470, allocated nonzeros=88470
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 364 nodes, limit used is 5
Solver diverged, reason -11. Modifying the solver in order to improve the convergence...
MatNorm: 3.04644e+06 VecNorm: 0.0239994
Linear solve converged due to CONVERGED_ITS iterations 1
Solver converged within 1 iterations.
Newton iteration: 1 - L2 Position Norm: 1.796085E-07 - L2 Pressure Norm: 9.187252E-02
Memory used by each processor: 36.695312 Mb
Assemble Linear System. Elapsed time: 0.020556
MatNorm: 3.04644e+06 VecNorm: 2.81116e-06
0 KSP unpreconditioned resid norm 1.136884066004e-05 true resid norm -nan ||r(i)||/||b|| -nan
Linear solve did not converge due to DIVERGED_PC_FAILED iterations 0
PC failed due to SUBPC_ERROR
KSP Object: 4 MPI processes
type: fgmres
restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=500, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: bjacobi
number of blocks = 4
Local solver information for first block is in the following KSP and PC objects on rank 0:
Use -ksp_view ::ascii_info_detail to display information for all blocks
KSP Object: (sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (sub_) 1 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: nd
factor fill ratio given 5., needed 2.90995
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=1091, cols=1091
package used to perform factorization: petsc
total: nonzeros=58039, allocated nonzeros=58039
using I-node routines: found 364 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: (sub_) 1 MPI processes
type: seqaij
rows=1091, cols=1091
total: nonzeros=19945, allocated nonzeros=19945
total number of mallocs used during MatSetValues calls=0
using I-node routines: found 364 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=4362, cols=4362
total: nonzeros=88470, allocated nonzeros=88470
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 364 nodes, limit used is 5
Solver diverged, reason -11. Modifying the solver in order to improve the convergence...
MatNorm: 3.04644e+06 VecNorm: 2.81116e-06
Linear solve converged due to CONVERGED_ITS iterations 1
Solver converged within 1 iterations.
Newton iteration: 2 - L2 Position Norm: 1.868159E-12 - L2 Pressure Norm: 2.037029E-07
Memory used by each processor: 36.808594 Mb
----------------------- TIME STEP = 922, time = 0.184400 -----------------------
Mesh Regenerated. Elapsed time: 0.019474
Isolated nodes: 1
Assemble Linear System. Elapsed time: 0.030308
MatNorm: 3.04642e+06 VecNorm: 1305.09
0 KSP unpreconditioned resid norm 1.466597558465e+04 true resid norm -nan ||r(i)||/||b|| -nan
1 KSP unpreconditioned resid norm 3.992657613692e+02 true resid norm -nan ||r(i)||/||b|| -nan
2 KSP unpreconditioned resid norm 6.865492930467e+01 true resid norm -nan ||r(i)||/||b|| -nan
3 KSP unpreconditioned resid norm 1.488490448891e+01 true resid norm -nan ||r(i)||/||b|| -nan
4 KSP unpreconditioned resid norm 6.459160528254e+00 true resid norm -nan ||r(i)||/||b|| -nan
5 KSP unpreconditioned resid norm 2.684190657780e+00 true resid norm -nan ||r(i)||/||b|| -nan
6 KSP unpreconditioned resid norm 1.583730558735e+00 true resid norm -nan ||r(i)||/||b|| -nan
7 KSP unpreconditioned resid norm 7.857636392042e-01 true resid norm -nan ||r(i)||/||b|| -nan
8 KSP unpreconditioned resid norm 5.609287021479e-01 true resid norm -nan ||r(i)||/||b|| -nan
9 KSP unpreconditioned resid norm 4.240869629805e-01 true resid norm -nan ||r(i)||/||b|| -nan
10 KSP unpreconditioned resid norm 3.545861070917e-01 true resid norm -nan ||r(i)||/||b|| -nan
11 KSP unpreconditioned resid norm 2.796829041968e-01 true resid norm -nan ||r(i)||/||b|| -nan
12 KSP unpreconditioned resid norm 2.415853017221e-01 true resid norm -nan ||r(i)||/||b|| -nan
13 KSP unpreconditioned resid norm 1.933876557197e-01 true resid norm -nan ||r(i)||/||b|| -nan
14 KSP unpreconditioned resid norm 1.820288353613e-01 true resid norm -nan ||r(i)||/||b|| -nan
15 KSP unpreconditioned resid norm 1.657259644747e-01 true resid norm -nan ||r(i)||/||b|| -nan
16 KSP unpreconditioned resid norm 1.563463788745e-01 true resid norm -nan ||r(i)||/||b|| -nan
17 KSP unpreconditioned resid norm 1.272726963049e-01 true resid norm -nan ||r(i)||/||b|| -nan
18 KSP unpreconditioned resid norm 1.137797079759e-01 true resid norm -nan ||r(i)||/||b|| -nan
19 KSP unpreconditioned resid norm 8.582335118209e-02 true resid norm -nan ||r(i)||/||b|| -nan
20 KSP unpreconditioned resid norm 7.628931493998e-02 true resid norm -nan ||r(i)||/||b|| -nan
21 KSP unpreconditioned resid norm 5.901409359786e-02 true resid norm -nan ||r(i)||/||b|| -nan
22 KSP unpreconditioned resid norm 5.496262106550e-02 true resid norm -nan ||r(i)||/||b|| -nan
23 KSP unpreconditioned resid norm 4.367683601600e-02 true resid norm -nan ||r(i)||/||b|| -nan
24 KSP unpreconditioned resid norm 3.767769610963e-02 true resid norm -nan ||r(i)||/||b|| -nan
25 KSP unpreconditioned resid norm 2.758466841864e-02 true resid norm -nan ||r(i)||/||b|| -nan
26 KSP unpreconditioned resid norm 2.401068925144e-02 true resid norm -nan ||r(i)||/||b|| -nan
27 KSP unpreconditioned resid norm 1.918366114227e-02 true resid norm -nan ||r(i)||/||b|| -nan
28 KSP unpreconditioned resid norm 1.796891532704e-02 true resid norm -nan ||r(i)||/||b|| -nan
29 KSP unpreconditioned resid norm 1.646774691070e-02 true resid norm -nan ||r(i)||/||b|| -nan
30 KSP unpreconditioned resid norm 1.581043087339e-02 true resid norm -nan ||r(i)||/||b|| -nan
31 KSP unpreconditioned resid norm 1.451402784393e-02 true resid norm -nan ||r(i)||/||b|| -nan
32 KSP unpreconditioned resid norm 1.365719226793e-02 true resid norm -nan ||r(i)||/||b|| -nan
33 KSP unpreconditioned resid norm 1.221815466293e-02 true resid norm -nan ||r(i)||/||b|| -nan
34 KSP unpreconditioned resid norm 1.170507483612e-02 true resid norm -nan ||r(i)||/||b|| -nan
35 KSP unpreconditioned resid norm 1.112121419983e-02 true resid norm -nan ||r(i)||/||b|| -nan
36 KSP unpreconditioned resid norm 1.041368299534e-02 true resid norm -nan ||r(i)||/||b|| -nan
37 KSP unpreconditioned resid norm 8.898468360233e-03 true resid norm -nan ||r(i)||/||b|| -nan
38 KSP unpreconditioned resid norm 7.828540090048e-03 true resid norm -nan ||r(i)||/||b|| -nan
39 KSP unpreconditioned resid norm 6.804894322652e-03 true resid norm -nan ||r(i)||/||b|| -nan
40 KSP unpreconditioned resid norm 5.932441731922e-03 true resid norm -nan ||r(i)||/||b|| -nan
41 KSP unpreconditioned resid norm 5.038590720204e-03 true resid norm -nan ||r(i)||/||b|| -nan
42 KSP unpreconditioned resid norm 4.352003569050e-03 true resid norm -nan ||r(i)||/||b|| -nan
43 KSP unpreconditioned resid norm 3.340851172402e-03 true resid norm -nan ||r(i)||/||b|| -nan
44 KSP unpreconditioned resid norm 2.489084471832e-03 true resid norm -nan ||r(i)||/||b|| -nan
45 KSP unpreconditioned resid norm 1.982062096221e-03 true resid norm -nan ||r(i)||/||b|| -nan
46 KSP unpreconditioned resid norm 1.543532665899e-03 true resid norm -nan ||r(i)||/||b|| -nan
47 KSP unpreconditioned resid norm 1.041250067680e-03 true resid norm -nan ||r(i)||/||b|| -nan
48 KSP unpreconditioned resid norm 7.072998665082e-04 true resid norm -nan ||r(i)||/||b|| -nan
49 KSP unpreconditioned resid norm 4.326826499956e-04 true resid norm -nan ||r(i)||/||b|| -nan
50 KSP unpreconditioned resid norm 3.114665876716e-04 true resid norm -nan ||r(i)||/||b|| -nan
51 KSP unpreconditioned resid norm 1.971230239174e-04 true resid norm -nan ||r(i)||/||b|| -nan
52 KSP unpreconditioned resid norm 1.513573312329e-04 true resid norm -nan ||r(i)||/||b|| -nan
53 KSP unpreconditioned resid norm 8.825285013709e-05 true resid norm -nan ||r(i)||/||b|| -nan
Linear solve converged due to CONVERGED_RTOL iterations 53
KSP Object: 4 MPI processes
type: fgmres
restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=500, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: bjacobi
number of blocks = 4
Local solver information for first block is in the following KSP and PC objects on rank 0:
Use -ksp_view ::ascii_info_detail to display information for all blocks
KSP Object: (sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (sub_) 1 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: nd
factor fill ratio given 5., needed 3.8053
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=1089, cols=1089
package used to perform factorization: petsc
total: nonzeros=77571, allocated nonzeros=77571
using I-node routines: found 363 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: (sub_) 1 MPI processes
type: seqaij
rows=1089, cols=1089
total: nonzeros=20385, allocated nonzeros=20385
total number of mallocs used during MatSetValues calls=0
using I-node routines: found 363 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=4353, cols=4353
total: nonzeros=88389, allocated nonzeros=88389
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 363 nodes, limit used is 5
Solver converged within 53 iterations. Elapsed time: 0.112512
Newton iteration: 0 - L2 Position Norm: INF - L2 Pressure Norm: INF
More information about the petsc-users
mailing list