[petsc-users] preconditioning matrix-free newton-krylov

Mark Lohry mlohry at gmail.com
Mon Oct 30 18:38:07 CDT 2017


Sparsity pattern binary AIJ gzipped here, should have 100^2 blocks of all
1's indicating the non-zero positions:

https://github.com/mlohry/petsc_miscellany/blob/master/jacobian_sparsity.dat.gz

Using 32 processes on 2x16-core AMD 6274, timing for MatColoringApply is
~877 seconds/~15 minutes  for MATCOLORINGJP, ~30 seconds for
MATCOLORINGGREEDY. Let me know if you can't reproduce this and I'll put
together a MWE.

MATCOLORINGID and MATCOLORINGSL both seem to work in 9 seconds, though
documentation says they're both serial?


Regarding BAIJ, neither JP nor GREEDY work, throwing "Matrix must be AIJ
for greedy coloring", so am I out of luck for using MPIBAIJ?

On Mon, Oct 30, 2017 at 3:28 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
> > On Oct 30, 2017, at 2:23 PM, Mark Lohry <mlohry at gmail.com> wrote:
> >
> >
> >   Hmm, are those blocks dense? If so you could benefit enormously from
> using BAIJ format.
> >
> >
> > Yes they're dense blocks. Usually coupled compressible 3D NS with DG
> elements, 5 equations x order (N+1)*(N+2)*(N+3)/3 block size. So block
> sizes of 50^2 to 175^2 are typical. I'll try BAIJ; I initially set it up
> with AIJ as it seemed better supported in parallel on the linear solver
> table, but I suppose these are rather large blocks... still surprising
> performance as this was overall a pretty small system (1,536
> elements/diagonal 100^2 blocks).
>
>   Something is really wrong to get those huge times.
> >
> >
> > Could you run with -ksp_view_mat binary and send the resulting file
> called binaryoutput and we can run the coloring codes local to performance
> debug.
> >
> >
> > Will send this evening.
>
>   Thanks but send for the AIJ case, not BAIJ
>
> >
> >
> > On Mon, Oct 30, 2017 at 3:02 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> > > On Oct 30, 2017, at 1:58 PM, Mark Lohry <mlohry at gmail.com> wrote:
> > >
> > > Hmm, metis doesn't really have anything to do with the sparsity of the
> Jacobian does it?
> > >
> > > No, I just mean I'm doing initial partitioning and parallel
> communication for the residual evaluations independently of petsc, and then
> doing a 1-to-1 mapping to the petsc solution vector. Along with manually
> setting the non-zero structure of the MPIAIJ system as in the user manual.
> I don't think there's anything wrong with the system structure as it gives
> the same correct answer as the un-preconditioned matrix-free approach.
> > >
> > > The exact system those MatColoring times came from has size (100x100)
> blocks on the diagonals corresponding to the tetrahedral cells, with those
> having 4 neighbor blocks on the same row (or fewer for elements on
> boundaries.)
> >
> >   Hmm, are those blocks dense? If so you could benefit enormously from
> using BAIJ format.
> >
> >   Matt,
> >
> >     Sounds like performance bugs for the parallel coloring apply
> algorithms with big "diagonal blocks"
> >
> >   Mark,
> >
> >      Could you run with -ksp_view_mat binary and send the resulting file
> called binaryoutput and we can run the coloring codes local to performance
> debug.
> >
> >
> >   Barry
> >
> > >
> > > On Mon, Oct 30, 2017 at 1:55 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> > >
> > > > On Oct 30, 2017, at 12:39 PM, Mark Lohry <mlohry at gmail.com> wrote:
> > > >
> > > >
> > > > >
> > > > > 3) Are there any hooks analogous to KSPSetPreSolve/PostSolve for
> the FD computation of the jacobians, or for the computation of the
> preconditioner? I'd like to get a handle on the relative costs of these.
> > > >
> > > >   No, do you just want the time? You can get that from the logging;
> for example -log_view
> > > >
> > > > Yes, was just thinking in regards to your suggestion of recomputing
> when the number of linear iterations gets too high; I assume it's the ratio
> of preconditioner cost vs linear solver cost at runtime that's the metric
> of interest, and not the absolute value of either. But I'll cross that
> bridge when I come to it.
> > > >
> > > > When I had asked, I was looking to see where a long pause was
> happening thinking it was the FD jacobian; turned out to be before that in
> MatColoringApply which seems surprisingly expensive. MATCOLORINGJP took ~15
> minutes on 32 cores on a small 153,000^2 system, with MATCOLORINGGREEDY
> taking 30 seconds. Any guidance there, or is this expected? I'm not using
> DM, just manually entering the sparsity resulting from a metis
> decomposition of a tetrahedral mesh.
> > >
> > >    Hmm, metis doesn't really have anything to do with the sparsity of
> the Jacobian does it?
> > >
> > >   Matt,
> > >
> > >    These times are huge. What is going on?
> > >
> > >    Barry
> > >
> > > >
> > > >
> > > > Thanks for the info on the lag logic, I'll play with the TS pre/post
> calls for the time-accurate problems and only use LagJacobian.
> > > >
> > > > On Mon, Oct 30, 2017 at 11:29 AM, Smith, Barry F. <
> bsmith at mcs.anl.gov> wrote:
> > > >
> > > > > On Oct 29, 2017, at 11:50 AM, Mark Lohry <mlohry at gmail.com> wrote:
> > > > >
> > > > > Thanks again Barry, I've got the preconditioners hooked up with
> -snes_mf_operator and at least AMG looks to be working great on a high
> order unstructured DG problem.
> > > > >
> > > > > Couple questions on the SNESSetLagJacobian +
> SNESSetLagPreconditioner code flow:
> > > > >
> > > > > 1) With -snes_mf_operator, and given SNESSetLagJacobian(snes, 1)
> (default)  and SNESSetLagPreconditioner(snes, 2), after the first KSP solve
> in a newton iteration, will it do the finite different jacobian
> calculation? Or will the Jacobian only be computed when the preconditioner
> lag setting demands it on the 3rd newton step? I suspect it's the latter
> based on where I see the code pause.
> > > >
> > > >    SNES with -snes_mf_operator will ALWAYS use the matrix-free
> finite difference f(x+h) - f(x) to apply the matrix vector product.
> > > >
> > > >    The LagJacobian and LagPreconditioner are not coordinated. The
> first determines how often the Jacobian used for preconditioning is
> recomputed and the second determines how often the preconditioner is
> recomputed.
> > > >
> > > >    If you are using -snes_mf_operator then it never makes sense to
> have lagJacobian < lagPreconditioner since it would recompute the Jacobian
> but not actually use it. It also makes no sense for lagPreconditioner <
> lagJacobian because you'd be recomputing the preconditioner on the same
> Jacobian.
> > > >
> > > > But actually if you don't change the Jacobian used in building the
> preconditioner then when it tries to recompute the preconditioner it
> determines the matrix has not changed so skips rebuilding the
> preconditioner. So when using -snes_mf_operator there is really no reason
> generally to set the preconditioner lag.
> > > > >
> > > > > 2) How do implicit TS and SNESSetLagPreconditioner/Persists
> interact? Does the counter since-last-preconditioner-compute reset with
> time steps, or does that lag counter just increment with every SNES solve
> regardless of how many nonlinear solves might have happened in a given
> timestep? Say lag preconditioner is 2, and a time stepper uses 3, 2, and 3
> nonlinear solves on 3 steps, is the flow
> > > > >
> > > > > (time step 1)->(update preconditioner)->(snes solve)->(snes
> solve)->(update preconditioner)->(snes solve)
> > > > > (time step 2)->(snes solve)->(update preconditioner)->(snes solve)
> > > > > (time step 3)->(snes solve)->(update preconditioner)->(snes
> solve)->(snes solve)
> > > > >
> > > > > or
> > > > >
> > > > > (time step 1)->(update preconditioner)->(snes solve)->(snes
> solve)->(update preconditioner)->(snes solve)
> > > > > (time step 2)->(update preconditioner)->(snes solve)->(snes solve)
> > > > > (time step 3)->(update preconditioner)->(snes solve)->(snes
> solve)->(update preconditioner)->(snes solve)
> > > > >
> > > > > ?
> > > > >
> > > > > I think for implicit time stepping I'd probably want the
> preconditioner to be recomputed just once at the beginning of each time
> step, or some multiple of that. Does that sound reasonable?
> > > >
> > > >   Yes, what you want to do is completely reasonable.
> > > >
> > > >   You can use SNESSetLagJacobian() and
>  SNESSetLagJacobianPersists() in combination to have the Jacobian
> recomputed ever fixed number of times; if you set the persists flag and set
> LagJacobian to 10 it will recompute the Jacobian used in the preconditioner
> every 10th time a new Jacobian is needed.
> > > >
> > > >    If you want to compute the new Jacobian used to build the
> preconditioner once at the beginning of each new TS stage you can set
> SNESSetLagJacobian() to negative -2 in the TS prestage call. There are
> possibly other tricks you can do by setting the two flags at different
> locations.
> > > >
> > > >    An alternative to hardwiring how often the Jacobian used to build
> the preconditioner is rebuilt is to rebuild based on when the
> preconditioner starts "working less well". Here you could put an additional
> KSPMonitor or SNESMonitor that detects if the number of linear iterations
> is above a certain amount and then sets the recompute Jacobian flag to -2
> so that for the next solve it recreates the Jacobian used in building the
> preconditioner.
> > > >
> > > >
> > > > >
> > > > > 3) Are there any hooks analogous to KSPSetPreSolve/PostSolve for
> the FD computation of the jacobians, or for the computation of the
> preconditioner? I'd like to get a handle on the relative costs of these.
> > > >
> > > >   No, do you just want the time? You can get that from the logging;
> for example -log_view
> > > >
> > > > >
> > > > >
> > > > > Best,
> > > > > Mark
> > > > >
> > > > > On Sat, Sep 23, 2017 at 3:28 PM, Mark Lohry <mlohry at gmail.com>
> wrote:
> > > > > Great, thanks Barry.
> > > > >
> > > > > On Sat, Sep 23, 2017 at 3:12 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > > > >
> > > > > > On Sep 23, 2017, at 12:48 PM, Mark W. Lohry <
> mlohry at princeton.edu> wrote:
> > > > > >
> > > > > > I'm currently using JFNK in an application where I don't have a
> hand-coded jacobian, and it's working well enough but as expected the
> scaling isn't great.
> > > > > >
> > > > > > What is the general process for using PC with
> MatMFFDComputeJacobian? Does it make sense to occasionally have petsc
> re-compute the jacobian via finite differences, and then recompute the
> preconditioner? Any that just need the sparsity structure?
> > > > >
> > > > >  Mark
> > > > >
> > > > >    Yes, this is a common approach. SNESSetLagJacobian
> -snes_lag_jacobian
> > > > >
> > > > >     The normal approach in SNES to use matrix-free for the
> operator and use finite differences to compute an approximate Jacobian used
> to construct preconditioners is to to create a sparse matrix with the
> sparsity of the approximate Jacobian (yes you need a way to figure out the
> sparsity, if you use DMDA it will figure out the sparsity for you). Then
> you use
> > > > >
> > > > >    SNESSetJacobian(snes,J,J, SNESComputeJacobianDefaultColor,
> NULL);
> > > > >
> > > > > and use the options database option -snes_mf_operator
> > > > >
> > > > >
> > > > > > Are there any PCs that don't work in the matrix-free context?
> > > > >
> > > > >   If you do the above you can use almost all the PC since you are
> providing an explicit matrix from which to build the preconditioner
> > > > >
> > > > > > Are there any example codes I overlooked?
> > > > > >
> > > > > > Last but not least... can the Boomer-AMG preconditioner work
> with JFNK? To really show my ignorance of AMG, can it actually be written
> as a matrix P^-1(Ax-b)=0, , or is it just a linear operator?
> > > > >
> > > > >   Again, if you provide an approximate Jacobian like above you can
> use it with BoomerAMG, if you provide NO explicit matrix you cannot use
> BoomerAMG or almost any other preconditioner.
> > > > >
> > > > >    Barry
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Mark
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171030/6ce9b981/attachment-0001.html>


More information about the petsc-users mailing list