[petsc-users] 'Preconditioning' with lower-order method
Zou, Ling
lzou at anl.gov
Mon Mar 4 19:38:41 CST 2024
From: Zhang, Hong <hongzhang at anl.gov>
Date: Monday, March 4, 2024 at 6:34 PM
To: Zou, Ling <lzou at anl.gov>
Cc: Jed Brown <jed at jedbrown.org>, Barry Smith <bsmith at petsc.dev>, petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] 'Preconditioning' with lower-order method
Ling,
Are you using PETSc TS? If so, it may worth trying Crank-Nicolson first to see if the nonlinear solve becomes faster.
>>>>> No, I’m not using TS, and I don’t plan to use CN. From my experience, when dealing with (nearly) incompressible flow problems, CN often cause (very large) pressure temporal oscillations, and to avoid that, the pressure is often using fully implicit method, so that would cause quite some code implementation issue.
For the pressure oscillation issue, also see page 7 of INL/EXT-12-27197. Notes on Newton-Krylov Based Incompressible Flow Projection Solver
In addition, you can try to improve the performance by pruning the Jacobian matrix.
TSPruneIJacobianColor()<https://urldefense.us/v3/__https://petsc.org/main/manualpages/TS/TSPruneIJacobianColor/__;!!G_uCfscf7eWS!aXVa0uz1LUIOdvZEPlRJOhRzz9h8MSM4vhl93kknxKGb8hkTyjCFJmSZIGr0fYx90rrqotBGdw-NwiF27Ec$ > sometimes can reduce the number of colors especially for high-order methods and make your Jacobian matrix more compact. An example of usage can be found here<https://urldefense.us/v3/__https://petsc.org/release/src/ts/tests/ex5.c.html__;!!G_uCfscf7eWS!aXVa0uz1LUIOdvZEPlRJOhRzz9h8MSM4vhl93kknxKGb8hkTyjCFJmSZIGr0fYx90rrqotBGdw-NmisGfPQ$ >. If you are not using TS, there is a SNES version SNESPruneJacobianColor()<https://urldefense.us/v3/__https://petsc.org/main/manualpages/SNES/SNESPruneJacobianColor/__;!!G_uCfscf7eWS!aXVa0uz1LUIOdvZEPlRJOhRzz9h8MSM4vhl93kknxKGb8hkTyjCFJmSZIGr0fYx90rrqotBGdw-NfYBUJqI$ > for the same functionality.
>>>>> The following code is how I setup the coloring.
{
// Create Matrix-free context
MatCreateSNESMF(snes, &J_MatrixFree);
// Let the problem setup Jacobian matrix sparsity
p_sim->FillJacobianMatrixNonZeroEntry(P_Mat);
// See PETSc examples:
// https://urldefense.us/v3/__https://petsc.org/release/src/snes/tutorials/ex14.c.html__;!!G_uCfscf7eWS!aXVa0uz1LUIOdvZEPlRJOhRzz9h8MSM4vhl93kknxKGb8hkTyjCFJmSZIGr0fYx90rrqotBGdw-N3ZHE6Qw$
// https://urldefense.us/v3/__https://petsc.org/release/src/mat/tutorials/ex16.c.html__;!!G_uCfscf7eWS!aXVa0uz1LUIOdvZEPlRJOhRzz9h8MSM4vhl93kknxKGb8hkTyjCFJmSZIGr0fYx90rrqotBGdw-NcpSrhMM$
ISColoring iscoloring;
MatColoring mc;
MatColoringCreate(P_Mat, &mc);
MatColoringSetType(mc, MATCOLORINGSL);
MatColoringSetFromOptions(mc);
MatColoringApply(mc, &iscoloring);
MatColoringDestroy(&mc);
MatFDColoringCreate(P_Mat, iscoloring, &fdcoloring);
MatFDColoringSetFunction(
fdcoloring, (PetscErrorCode(*)(void))(void (*)(void))snesFormFunction, this);
MatFDColoringSetFromOptions(fdcoloring);
MatFDColoringSetUp(P_Mat, iscoloring, fdcoloring);
ISColoringDestroy(&iscoloring);
// Should I prune here? Like
SNESPruneJacobianColor<https://urldefense.us/v3/__https://petsc.org/main/manualpages/SNES/SNESPruneJacobianColor/__;!!G_uCfscf7eWS!aXVa0uz1LUIOdvZEPlRJOhRzz9h8MSM4vhl93kknxKGb8hkTyjCFJmSZIGr0fYx90rrqotBGdw-NfYBUJqI$ >(snes, P_Mat, P_Mat);
SNESSetJacobian(snes, // snes
J_MatrixFree, // Jacobian-free
P_Mat, // Preconditioning matrix
SNESComputeJacobianDefaultColor, // Use finite differencing and coloring
fdcoloring); // fdcoloring
}
Thanks,
-Ling
Hong (Mr.)
On Mar 3, 2024, at 11:48 PM, Zou, Ling via petsc-users <petsc-users at mcs.anl.gov> wrote:
From: Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>>
Date: Sunday, March 3, 2024 at 11:35 PM
To: Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov>>, Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] 'Preconditioning' with lower-order method
If you're having PETSc use coloring and have confirmed that the stencil is sufficient, then it would be nonsmoothness (again, consider the limiter you've chosen) preventing quadratic convergence (assuming that doesn't kick in eventually). Note
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
If you're having PETSc use coloring and have confirmed that the stencil is sufficient, then it would be nonsmoothness (again, consider the limiter you've chosen) preventing quadratic convergence (assuming that doesn't kick in eventually).
• Yes, I do use coloring, and I do provide sufficient stencil, i.e., neighbor’s neighbor. The sufficiency is confirmed by PETSc’s -snes_test_jacobian and -snes_test_jacobian_view options.
Note that assembling a Jacobian of a second order TVD operator requires at least second neighbors while the first order needs only first neighbors, thus is much sparser and needs fewer colors to compute.
• In my code implementation, when marking the Jacobian nonzero pattern, I don’t differentiate FV1 or FV2, I always use the FV2 stencil, so it’s a bit ‘fat’ for the FV1 method, but worked just fine.
I expect you're either not exploiting that in the timings or something else is amiss. You can run with `-log_view -snes_view -ksp_converged_reason` to get a bit more information about what's happening.
• The attached is screen output as you suggest. The linear and nonlinear performance of FV2 is both worse from the output.
FV2:
Time Step 149, time = 13229.7, dt = 100
NL Step = 0, fnorm = 7.80968E-03
Linear solve converged due to CONVERGED_RTOL iterations 26
NL Step = 1, fnorm = 7.65731E-03
Linear solve converged due to CONVERGED_RTOL iterations 24
NL Step = 2, fnorm = 6.85034E-03
Linear solve converged due to CONVERGED_RTOL iterations 27
NL Step = 3, fnorm = 6.11873E-03
Linear solve converged due to CONVERGED_RTOL iterations 25
NL Step = 4, fnorm = 1.57347E-03
Linear solve converged due to CONVERGED_RTOL iterations 27
NL Step = 5, fnorm = 9.03536E-04
SNES Object: 1 MPI process
type: newtonls
maximum iterations=20, maximum function evaluations=10000
tolerances: relative=1e-08, absolute=1e-06, solution=1e-08
total number of linear solver iterations=129
total number of function evaluations=144
norm schedule ALWAYS
Jacobian is applied matrix-free with differencing
Preconditioning Jacobian is built using finite differences with coloring
SNESLineSearch Object: 1 MPI process
type: bt
interpolation: cubic
alpha=1.000000e-04
maxstep=1.000000e+08, minlambda=1.000000e-12
tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
maximum iterations=40
KSP Object: 1 MPI process
type: gmres
restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=100, initial guess is zero
tolerances: relative=0.0001, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
type: ilu
out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
using diagonal shift to prevent zero pivot [NONZERO]
matrix ordering: rcm
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI process
type: seqaij
rows=8715, cols=8715
package used to perform factorization: petsc
total: nonzeros=38485, allocated nonzeros=38485
not using I-node routines
linear system matrix followed by preconditioner matrix:
Mat Object: 1 MPI process
type: mffd
rows=8715, cols=8715
Matrix-free approximation:
err=1.49012e-08 (relative error in function evaluation)
Using wp compute h routine
Does not compute normU
Mat Object: 1 MPI process
type: seqaij
rows=8715, cols=8715
total: nonzeros=38485, allocated nonzeros=38485
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Solve Converged!
FV1:
Time Step 149, time = 13229.7, dt = 100
NL Step = 0, fnorm = 7.90072E-03
Linear solve converged due to CONVERGED_RTOL iterations 12
NL Step = 1, fnorm = 2.01919E-04
Linear solve converged due to CONVERGED_RTOL iterations 17
NL Step = 2, fnorm = 1.06960E-05
Linear solve converged due to CONVERGED_RTOL iterations 15
NL Step = 3, fnorm = 2.41683E-09
SNES Object: 1 MPI process
type: newtonls
maximum iterations=20, maximum function evaluations=10000
tolerances: relative=1e-08, absolute=1e-06, solution=1e-08
total number of linear solver iterations=44
total number of function evaluations=51
norm schedule ALWAYS
Jacobian is applied matrix-free with differencing
Preconditioning Jacobian is built using finite differences with coloring
SNESLineSearch Object: 1 MPI process
type: bt
interpolation: cubic
alpha=1.000000e-04
maxstep=1.000000e+08, minlambda=1.000000e-12
tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
maximum iterations=40
KSP Object: 1 MPI process
type: gmres
restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=100, initial guess is zero
tolerances: relative=0.0001, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
type: ilu
out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
using diagonal shift to prevent zero pivot [NONZERO]
matrix ordering: rcm
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI process
type: seqaij
rows=8715, cols=8715
package used to perform factorization: petsc
total: nonzeros=38485, allocated nonzeros=38485
not using I-node routines
linear system matrix followed by preconditioner matrix:
Mat Object: 1 MPI process
type: mffd
rows=8715, cols=8715
Matrix-free approximation:
err=1.49012e-08 (relative error in function evaluation)
Using wp compute h routine
Does not compute normU
Mat Object: 1 MPI process
type: seqaij
rows=8715, cols=8715
total: nonzeros=38485, allocated nonzeros=38485
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Solve Converged!
"Zou, Ling via petsc-users" <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> writes:
> Barry, thank you.
> I am not sure if I exactly follow you on this:
> “Are you forming the Jacobian for the first and second order cases inside of Newton?”
>
> The problem that we deal with, heat/mass transfer in heterogeneous systems (reactor system), is generally small in terms of size, i.e., # of DOFs (several k to maybe 100k level), so for now, I completely rely on PETSc to compute Jacobian, i.e., finite-differencing.
>
> That’s a good suggestion to see the time spent during various events.
> What motivated me to try the options are the following observations.
>
> 2nd order FVM:
>
> Time Step 149, time = 13229.7, dt = 100
>
> NL Step = 0, fnorm = 7.80968E-03
>
> NL Step = 1, fnorm = 7.65731E-03
>
> NL Step = 2, fnorm = 6.85034E-03
>
> NL Step = 3, fnorm = 6.11873E-03
>
> NL Step = 4, fnorm = 1.57347E-03
>
> NL Step = 5, fnorm = 9.03536E-04
>
> Solve Converged!
>
> 1st order FVM:
>
> Time Step 149, time = 13229.7, dt = 100
>
> NL Step = 0, fnorm = 7.90072E-03
>
> NL Step = 1, fnorm = 2.01919E-04
>
> NL Step = 2, fnorm = 1.06960E-05
>
> NL Step = 3, fnorm = 2.41683E-09
>
> Solve Converged!
>
> Notice the obvious ‘stagnant’ in residual for the 2nd order method while not in the 1st order.
> For the same problem, the wall time is 10 sec vs 6 sec. I would be happy if I can reduce 2 sec for the 2nd order method.
>
> -Ling
>
> From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
> Date: Sunday, March 3, 2024 at 12:06 PM
> To: Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov>>
> Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] 'Preconditioning' with lower-order method
> Are you forming the Jacobian for the first and second order cases inside of Newton? You can run both with -log_view to see how much time is spent in the various events (compute function, compute Jacobian, linear solve, .. . ) for the two cases
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
>
> Are you forming the Jacobian for the first and second order cases inside of Newton?
>
> You can run both with -log_view to see how much time is spent in the various events (compute function, compute Jacobian, linear solve, ...) for the two cases and compare them.
>
>
>
>
> On Mar 3, 2024, at 11:42 AM, Zou, Ling via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
>
> Original email may have been sent to the incorrect place.
> See below.
>
> -Ling
>
> From: Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov><mailto:lzou at anl.gov>>
> Date: Sunday, March 3, 2024 at 10:34 AM
> To: petsc-users <petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov><mailto:petsc-users-bounces at mcs.anl.gov>>
> Subject: 'Preconditioning' with lower-order method
> Hi all,
>
> I am solving a PDE system over a spatial domain. Numerical methods are:
>
> * Finite volume method (both 1st and 2nd order implemented)
> * BDF1 and BDF2 for time integration.
> What I have noticed is that 1st order FVM converges much faster than 2nd order FVM, regardless the time integration scheme. Well, not surprising since 2nd order FVM introduces additional non-linearity.
>
> I’m thinking about two possible ways to speed up 2nd order FVM, and would like to get some thoughts or community knowledge before jumping into code implementation.
>
> Say, let the 2nd order FVM residual function be F2(x) = 0; and the 1st order FVM residual function be F1(x) = 0.
>
> 1. Option – 1, multi-step for each time step
> Step 1: solving F1(x) = 0 to obtain a temporary solution x1
> Step 2: feed x1 as an initial guess to solve F2(x) = 0 to obtain the final solution.
> [Not sure if gain any saving at all]
>
>
> 1. Option -2, dynamically changing residual function F(x)
> In pseudo code, would be something like.
>
> snesFormFunction(SNES snes, Vec u, Vec f, void *)
> {
> if (snes.nl_it_no < 4) // 4 being arbitrary here
> f = F1(u);
> else
> f = F2(u);
> }
>
> I know this might be a bit crazy since it may crash after switching residual function, still, any thoughts?
>
> Best,
>
> -Ling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240305/3e11c80b/attachment-0001.html>
More information about the petsc-users
mailing list