[petsc-users] 'Preconditioning' with lower-order method

Mon Mar 4 18:34:45 CST 2024

Ling,

Are you using PETSc TS? If so, it may worth trying Crank-Nicolson first to see if the nonlinear solve becomes faster.

In addition, you can try to improve the performance by pruning the Jacobian matrix.

TSPruneIJacobianColor()<https://urldefense.us/v3/__https://petsc.org/main/manualpages/TS/TSPruneIJacobianColor/__;!!G_uCfscf7eWS!aAgMOjDD6ZgyEaxqc_vH-Yl5P1uNvFWa8_Lc5TWgoXfSz9x7kNgCi4jr10b2PmgzFwIyeCZGCGLUYU6hNJdYW8zcNQ$ > sometimes can reduce the number of colors especially for high-order methods and make your Jacobian matrix more compact. An example of usage can be found here<https://urldefense.us/v3/__https://petsc.org/release/src/ts/tests/ex5.c.html__;!!G_uCfscf7eWS!aAgMOjDD6ZgyEaxqc_vH-Yl5P1uNvFWa8_Lc5TWgoXfSz9x7kNgCi4jr10b2PmgzFwIyeCZGCGLUYU6hNJd4iC6hZw$ >. If you are not using TS, there is a SNES version SNESPruneJacobianColor()<https://urldefense.us/v3/__https://petsc.org/main/manualpages/SNES/SNESPruneJacobianColor/__;!!G_uCfscf7eWS!aAgMOjDD6ZgyEaxqc_vH-Yl5P1uNvFWa8_Lc5TWgoXfSz9x7kNgCi4jr10b2PmgzFwIyeCZGCGLUYU6hNJfq48Io5Q$ > for the same functionality.

Hong (Mr.)

On Mar 3, 2024, at 11:48 PM, Zou, Ling via petsc-users <petsc-users at mcs.anl.gov> wrote:

From: Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>>
Date: Sunday, March 3, 2024 at 11:35 PM
To: Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov>>, Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] 'Preconditioning' with lower-order method
If you're having PETSc use coloring and have confirmed that the stencil is sufficient, then it would be nonsmoothness (again, consider the limiter you've chosen) preventing quadratic convergence (assuming that doesn't kick in eventually). Note
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

If you're having PETSc use coloring and have confirmed that the stencil is sufficient, then it would be nonsmoothness (again, consider the limiter you've chosen) preventing quadratic convergence (assuming that doesn't kick in eventually).

•         Yes, I do use coloring, and I do provide sufficient stencil, i.e., neighbor’s neighbor. The sufficiency is confirmed by PETSc’s -snes_test_jacobian and -snes_test_jacobian_view options.

Note that assembling a Jacobian of a second order TVD operator requires at least second neighbors while the first order needs only first neighbors, thus is much sparser and needs fewer colors to compute.

•         In my code implementation, when marking the Jacobian nonzero pattern, I don’t differentiate FV1 or FV2, I always use the FV2 stencil, so it’s a bit ‘fat’ for the FV1 method, but worked just fine.

I expect you're either not exploiting that in the timings or something else is amiss. You can run with `-log_view -snes_view -ksp_converged_reason` to get a bit more information about what's happening.

•         The attached is screen output as you suggest. The linear and nonlinear performance of FV2 is both worse from the output.

FV2:

Time Step 149, time = 13229.7, dt = 100
    NL Step =  0, fnorm =  7.80968E-03
  Linear solve converged due to CONVERGED_RTOL iterations 26
    NL Step =  1, fnorm =  7.65731E-03
  Linear solve converged due to CONVERGED_RTOL iterations 24
    NL Step =  2, fnorm =  6.85034E-03
  Linear solve converged due to CONVERGED_RTOL iterations 27
    NL Step =  3, fnorm =  6.11873E-03
  Linear solve converged due to CONVERGED_RTOL iterations 25
    NL Step =  4, fnorm =  1.57347E-03
  Linear solve converged due to CONVERGED_RTOL iterations 27
    NL Step =  5, fnorm =  9.03536E-04
SNES Object: 1 MPI process
  type: newtonls
  maximum iterations=20, maximum function evaluations=10000
  tolerances: relative=1e-08, absolute=1e-06, solution=1e-08
  total number of linear solver iterations=129
  total number of function evaluations=144
  norm schedule ALWAYS
  Jacobian is applied matrix-free with differencing
  Preconditioning Jacobian is built using finite differences with coloring
  SNESLineSearch Object: 1 MPI process
    type: bt
      interpolation: cubic
      alpha=1.000000e-04
    maxstep=1.000000e+08, minlambda=1.000000e-12
    tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
    maximum iterations=40
  KSP Object: 1 MPI process
    type: gmres
      restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
      happy breakdown tolerance 1e-30
    maximum iterations=100, initial guess is zero
    tolerances:  relative=0.0001, absolute=1e-50, divergence=10000.
    left preconditioning
    using PRECONDITIONED norm type for convergence test
  PC Object: 1 MPI process
    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      using diagonal shift to prevent zero pivot [NONZERO]
      matrix ordering: rcm
      factor fill ratio given 1., needed 1.
        Factored matrix follows:
          Mat Object: 1 MPI process
            type: seqaij
            rows=8715, cols=8715
            package used to perform factorization: petsc
            total: nonzeros=38485, allocated nonzeros=38485
              not using I-node routines
    linear system matrix followed by preconditioner matrix:
    Mat Object: 1 MPI process
      type: mffd
      rows=8715, cols=8715
        Matrix-free approximation:
          err=1.49012e-08 (relative error in function evaluation)
          Using wp compute h routine
              Does not compute normU
    Mat Object: 1 MPI process
      type: seqaij
      rows=8715, cols=8715
      total: nonzeros=38485, allocated nonzeros=38485
      total number of mallocs used during MatSetValues calls=0
        not using I-node routines
 Solve Converged!

FV1:

Time Step 149, time = 13229.7, dt = 100
    NL Step =  0, fnorm =  7.90072E-03
  Linear solve converged due to CONVERGED_RTOL iterations 12
    NL Step =  1, fnorm =  2.01919E-04
  Linear solve converged due to CONVERGED_RTOL iterations 17
    NL Step =  2, fnorm =  1.06960E-05
  Linear solve converged due to CONVERGED_RTOL iterations 15
    NL Step =  3, fnorm =  2.41683E-09
SNES Object: 1 MPI process
  type: newtonls
  maximum iterations=20, maximum function evaluations=10000
  tolerances: relative=1e-08, absolute=1e-06, solution=1e-08
  total number of linear solver iterations=44
  total number of function evaluations=51
  norm schedule ALWAYS
  Jacobian is applied matrix-free with differencing
  Preconditioning Jacobian is built using finite differences with coloring
  SNESLineSearch Object: 1 MPI process
    type: bt
      interpolation: cubic
      alpha=1.000000e-04
    maxstep=1.000000e+08, minlambda=1.000000e-12
    tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
    maximum iterations=40
  KSP Object: 1 MPI process
    type: gmres
      restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
      happy breakdown tolerance 1e-30
    maximum iterations=100, initial guess is zero
    tolerances:  relative=0.0001, absolute=1e-50, divergence=10000.
    left preconditioning
    using PRECONDITIONED norm type for convergence test
  PC Object: 1 MPI process
    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      using diagonal shift to prevent zero pivot [NONZERO]
      matrix ordering: rcm
      factor fill ratio given 1., needed 1.
        Factored matrix follows:
          Mat Object: 1 MPI process
            type: seqaij
            rows=8715, cols=8715
            package used to perform factorization: petsc
            total: nonzeros=38485, allocated nonzeros=38485
              not using I-node routines
    linear system matrix followed by preconditioner matrix:
    Mat Object: 1 MPI process
      type: mffd
      rows=8715, cols=8715
        Matrix-free approximation:
          err=1.49012e-08 (relative error in function evaluation)
          Using wp compute h routine
              Does not compute normU
    Mat Object: 1 MPI process
      type: seqaij
      rows=8715, cols=8715
      total: nonzeros=38485, allocated nonzeros=38485
      total number of mallocs used during MatSetValues calls=0
        not using I-node routines
 Solve Converged!

"Zou, Ling via petsc-users" <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> writes:

> Barry, thank you.

> I am not sure if I exactly follow you on this:

> “Are you forming the Jacobian for the first and second order cases inside of Newton?”

>

> The problem that we deal with, heat/mass transfer in heterogeneous systems (reactor system), is generally small in terms of size, i.e., # of DOFs (several k to maybe 100k level), so for now, I completely rely on PETSc to compute Jacobian, i.e., finite-differencing.

>

> That’s a good suggestion to see the time spent during various events.

> What motivated me to try the options are the following observations.

>

> 2nd order FVM:

>

> Time Step 149, time = 13229.7, dt = 100

>

>     NL Step =  0, fnorm =  7.80968E-03

>

>     NL Step =  1, fnorm =  7.65731E-03

>

>     NL Step =  2, fnorm =  6.85034E-03

>

>     NL Step =  3, fnorm =  6.11873E-03

>

>     NL Step =  4, fnorm =  1.57347E-03

>

>     NL Step =  5, fnorm =  9.03536E-04

>

>  Solve Converged!

>

> 1st order FVM:

>

> Time Step 149, time = 13229.7, dt = 100

>

>     NL Step =  0, fnorm =  7.90072E-03

>

>     NL Step =  1, fnorm =  2.01919E-04

>

>     NL Step =  2, fnorm =  1.06960E-05

>

>     NL Step =  3, fnorm =  2.41683E-09

>

>  Solve Converged!

>

> Notice the obvious ‘stagnant’ in residual for the 2nd order method while not in the 1st order.

> For the same problem, the wall time is 10 sec vs 6 sec. I would be happy if I can reduce 2 sec for the 2nd order method.

>

> -Ling

>

> From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>

> Date: Sunday, March 3, 2024 at 12:06 PM

> To: Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov>>

> Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>

> Subject: Re: [petsc-users] 'Preconditioning' with lower-order method

> Are you forming the Jacobian for the first and second order cases inside of Newton? You can run both with -log_view to see how much time is spent in the various events (compute function, compute Jacobian, linear solve, .. . ) for the two cases

> ZjQcmQRYFpfptBannerStart

> This Message Is From an External Sender

> This message came from outside your organization.

>

> ZjQcmQRYFpfptBannerEnd

>

>    Are you forming the Jacobian for the first and second order cases inside of Newton?

>

>    You can run both with -log_view to see how much time is spent in the various events (compute function, compute Jacobian, linear solve, ...) for the two cases and compare them.

>

>

>

>

> On Mar 3, 2024, at 11:42 AM, Zou, Ling via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

>

> Original email may have been sent to the incorrect place.

> See below.

>

> -Ling

>

> From: Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov><mailto:lzou at anl.gov>>

> Date: Sunday, March 3, 2024 at 10:34 AM

> To: petsc-users <petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov><mailto:petsc-users-bounces at mcs.anl.gov>>

> Subject: 'Preconditioning' with lower-order method

> Hi all,

>

> I am solving a PDE system over a spatial domain. Numerical methods are:

>

>   *   Finite volume method (both 1st and 2nd order implemented)

>   *   BDF1 and BDF2 for time integration.

> What I have noticed is that 1st order FVM converges much faster than 2nd order FVM, regardless the time integration scheme. Well, not surprising since 2nd order FVM introduces additional non-linearity.

>

> I’m thinking about two possible ways to speed up 2nd order FVM, and would like to get some thoughts or community knowledge before jumping into code implementation.

>

> Say, let the 2nd order FVM residual function be F2(x) = 0; and the 1st order FVM residual function be F1(x) = 0.

>

>   1.  Option – 1, multi-step for each time step

> Step 1: solving F1(x) = 0 to obtain a temporary solution x1

> Step 2: feed x1 as an initial guess to solve F2(x) = 0 to obtain the final solution.

> [Not sure if gain any saving at all]

>

>

>   1.  Option -2, dynamically changing residual function F(x)

> In pseudo code, would be something like.

>

> snesFormFunction(SNES snes, Vec u, Vec f, void *)

> {

>   if (snes.nl_it_no < 4) // 4 being arbitrary here

>     f = F1(u);

>   else

>     f = F2(u);

> }

>

> I know this might be a bit crazy since it may crash after switching residual function, still, any thoughts?

>

> Best,

>

> -Ling

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240305/d1435089/attachment-0001.html>