<div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-im" style="font-size:12.8px"><br></span><span style="font-size:12.8px"> Hmm, are those blocks dense? If so you could benefit enormously from using BAIJ format.</span></blockquote><div> </div><div><br></div><div>Yes they're dense blocks. Usually coupled compressible 3D NS with DG elements, 5 equations x order (N+1)*(N+2)*(N+3)/3 block size. So block sizes of 50^2 to 175^2 are typical. I'll try BAIJ; I initially set it up with AIJ as it seemed better supported in parallel on the linear solver table, but I suppose these are rather large blocks... still surprising performance as this was overall a pretty small system (1,536 elements/diagonal 100^2 blocks).</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-size:12.8px">Could you run with -ksp_view_mat binary and send the resulting file called binaryoutput and we can run the coloring codes local to performance debug.</span></blockquote><div><br></div><div><br></div><div>Will send this evening.</div><div> </div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 30, 2017 at 3:02 PM, Smith, Barry F. <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-"><br>
> On Oct 30, 2017, at 1:58 PM, Mark Lohry <<a href="mailto:mlohry@gmail.com">mlohry@gmail.com</a>> wrote:<br>
><br>
> Hmm, metis doesn't really have anything to do with the sparsity of the Jacobian does it?<br>
><br>
> No, I just mean I'm doing initial partitioning and parallel communication for the residual evaluations independently of petsc, and then doing a 1-to-1 mapping to the petsc solution vector. Along with manually setting the non-zero structure of the MPIAIJ system as in the user manual. I don't think there's anything wrong with the system structure as it gives the same correct answer as the un-preconditioned matrix-free approach.<br>
><br>
> The exact system those MatColoring times came from has size (100x100) blocks on the diagonals corresponding to the tetrahedral cells, with those having 4 neighbor blocks on the same row (or fewer for elements on boundaries.)<br>
<br>
</span> Hmm, are those blocks dense? If so you could benefit enormously from using BAIJ format.<br>
<br>
Matt,<br>
<br>
Sounds like performance bugs for the parallel coloring apply algorithms with big "diagonal blocks"<br>
<br>
Mark,<br>
<br>
Could you run with -ksp_view_mat binary and send the resulting file called binaryoutput and we can run the coloring codes local to performance debug.<br>
<span class="gmail-HOEnZb"><font color="#888888"><br>
<br>
Barry<br>
</font></span><div class="gmail-HOEnZb"><div class="gmail-h5"><br>
><br>
> On Mon, Oct 30, 2017 at 1:55 PM, Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
><br>
> > On Oct 30, 2017, at 12:39 PM, Mark Lohry <<a href="mailto:mlohry@gmail.com">mlohry@gmail.com</a>> wrote:<br>
> ><br>
> ><br>
> > ><br>
> > > 3) Are there any hooks analogous to KSPSetPreSolve/PostSolve for the FD computation of the jacobians, or for the computation of the preconditioner? I'd like to get a handle on the relative costs of these.<br>
> ><br>
> > No, do you just want the time? You can get that from the logging; for example -log_view<br>
> ><br>
> > Yes, was just thinking in regards to your suggestion of recomputing when the number of linear iterations gets too high; I assume it's the ratio of preconditioner cost vs linear solver cost at runtime that's the metric of interest, and not the absolute value of either. But I'll cross that bridge when I come to it.<br>
> ><br>
> > When I had asked, I was looking to see where a long pause was happening thinking it was the FD jacobian; turned out to be before that in MatColoringApply which seems surprisingly expensive. MATCOLORINGJP took ~15 minutes on 32 cores on a small 153,000^2 system, with MATCOLORINGGREEDY taking 30 seconds. Any guidance there, or is this expected? I'm not using DM, just manually entering the sparsity resulting from a metis decomposition of a tetrahedral mesh.<br>
><br>
> Hmm, metis doesn't really have anything to do with the sparsity of the Jacobian does it?<br>
><br>
> Matt,<br>
><br>
> These times are huge. What is going on?<br>
><br>
> Barry<br>
><br>
> ><br>
> ><br>
> > Thanks for the info on the lag logic, I'll play with the TS pre/post calls for the time-accurate problems and only use LagJacobian.<br>
> ><br>
> > On Mon, Oct 30, 2017 at 11:29 AM, Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
> ><br>
> > > On Oct 29, 2017, at 11:50 AM, Mark Lohry <<a href="mailto:mlohry@gmail.com">mlohry@gmail.com</a>> wrote:<br>
> > ><br>
> > > Thanks again Barry, I've got the preconditioners hooked up with -snes_mf_operator and at least AMG looks to be working great on a high order unstructured DG problem.<br>
> > ><br>
> > > Couple questions on the SNESSetLagJacobian + SNESSetLagPreconditioner code flow:<br>
> > ><br>
> > > 1) With -snes_mf_operator, and given SNESSetLagJacobian(snes, 1) (default) and SNESSetLagPreconditioner(snes, 2), after the first KSP solve in a newton iteration, will it do the finite different jacobian calculation? Or will the Jacobian only be computed when the preconditioner lag setting demands it on the 3rd newton step? I suspect it's the latter based on where I see the code pause.<br>
> ><br>
> > SNES with -snes_mf_operator will ALWAYS use the matrix-free finite difference f(x+h) - f(x) to apply the matrix vector product.<br>
> ><br>
> > The LagJacobian and LagPreconditioner are not coordinated. The first determines how often the Jacobian used for preconditioning is recomputed and the second determines how often the preconditioner is recomputed.<br>
> ><br>
> > If you are using -snes_mf_operator then it never makes sense to have lagJacobian < lagPreconditioner since it would recompute the Jacobian but not actually use it. It also makes no sense for lagPreconditioner < lagJacobian because you'd be recomputing the preconditioner on the same Jacobian.<br>
> ><br>
> > But actually if you don't change the Jacobian used in building the preconditioner then when it tries to recompute the preconditioner it determines the matrix has not changed so skips rebuilding the preconditioner. So when using -snes_mf_operator there is really no reason generally to set the preconditioner lag.<br>
> > ><br>
> > > 2) How do implicit TS and SNESSetLagPreconditioner/<wbr>Persists interact? Does the counter since-last-preconditioner-<wbr>compute reset with time steps, or does that lag counter just increment with every SNES solve regardless of how many nonlinear solves might have happened in a given timestep? Say lag preconditioner is 2, and a time stepper uses 3, 2, and 3 nonlinear solves on 3 steps, is the flow<br>
> > ><br>
> > > (time step 1)->(update preconditioner)->(snes solve)->(snes solve)->(update preconditioner)->(snes solve)<br>
> > > (time step 2)->(snes solve)->(update preconditioner)->(snes solve)<br>
> > > (time step 3)->(snes solve)->(update preconditioner)->(snes solve)->(snes solve)<br>
> > ><br>
> > > or<br>
> > ><br>
> > > (time step 1)->(update preconditioner)->(snes solve)->(snes solve)->(update preconditioner)->(snes solve)<br>
> > > (time step 2)->(update preconditioner)->(snes solve)->(snes solve)<br>
> > > (time step 3)->(update preconditioner)->(snes solve)->(snes solve)->(update preconditioner)->(snes solve)<br>
> > ><br>
> > > ?<br>
> > ><br>
> > > I think for implicit time stepping I'd probably want the preconditioner to be recomputed just once at the beginning of each time step, or some multiple of that. Does that sound reasonable?<br>
> ><br>
> > Yes, what you want to do is completely reasonable.<br>
> ><br>
> > You can use SNESSetLagJacobian() and SNESSetLagJacobianPersists() in combination to have the Jacobian recomputed ever fixed number of times; if you set the persists flag and set LagJacobian to 10 it will recompute the Jacobian used in the preconditioner every 10th time a new Jacobian is needed.<br>
> ><br>
> > If you want to compute the new Jacobian used to build the preconditioner once at the beginning of each new TS stage you can set SNESSetLagJacobian() to negative -2 in the TS prestage call. There are possibly other tricks you can do by setting the two flags at different locations.<br>
> ><br>
> > An alternative to hardwiring how often the Jacobian used to build the preconditioner is rebuilt is to rebuild based on when the preconditioner starts "working less well". Here you could put an additional KSPMonitor or SNESMonitor that detects if the number of linear iterations is above a certain amount and then sets the recompute Jacobian flag to -2 so that for the next solve it recreates the Jacobian used in building the preconditioner.<br>
> ><br>
> ><br>
> > ><br>
> > > 3) Are there any hooks analogous to KSPSetPreSolve/PostSolve for the FD computation of the jacobians, or for the computation of the preconditioner? I'd like to get a handle on the relative costs of these.<br>
> ><br>
> > No, do you just want the time? You can get that from the logging; for example -log_view<br>
> ><br>
> > ><br>
> > ><br>
> > > Best,<br>
> > > Mark<br>
> > ><br>
> > > On Sat, Sep 23, 2017 at 3:28 PM, Mark Lohry <<a href="mailto:mlohry@gmail.com">mlohry@gmail.com</a>> wrote:<br>
> > > Great, thanks Barry.<br>
> > ><br>
> > > On Sat, Sep 23, 2017 at 3:12 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
> > ><br>
> > > > On Sep 23, 2017, at 12:48 PM, Mark W. Lohry <<a href="mailto:mlohry@princeton.edu">mlohry@princeton.edu</a>> wrote:<br>
> > > ><br>
> > > > I'm currently using JFNK in an application where I don't have a hand-coded jacobian, and it's working well enough but as expected the scaling isn't great.<br>
> > > ><br>
> > > > What is the general process for using PC with MatMFFDComputeJacobian? Does it make sense to occasionally have petsc re-compute the jacobian via finite differences, and then recompute the preconditioner? Any that just need the sparsity structure?<br>
> > ><br>
> > > Mark<br>
> > ><br>
> > > Yes, this is a common approach. SNESSetLagJacobian -snes_lag_jacobian<br>
> > ><br>
> > > The normal approach in SNES to use matrix-free for the operator and use finite differences to compute an approximate Jacobian used to construct preconditioners is to to create a sparse matrix with the sparsity of the approximate Jacobian (yes you need a way to figure out the sparsity, if you use DMDA it will figure out the sparsity for you). Then you use<br>
> > ><br>
> > > SNESSetJacobian(snes,J,J, SNESComputeJacobianDefaultColo<wbr>r, NULL);<br>
> > ><br>
> > > and use the options database option -snes_mf_operator<br>
> > ><br>
> > ><br>
> > > > Are there any PCs that don't work in the matrix-free context?<br>
> > ><br>
> > > If you do the above you can use almost all the PC since you are providing an explicit matrix from which to build the preconditioner<br>
> > ><br>
> > > > Are there any example codes I overlooked?<br>
> > > ><br>
> > > > Last but not least... can the Boomer-AMG preconditioner work with JFNK? To really show my ignorance of AMG, can it actually be written as a matrix P^-1(Ax-b)=0, , or is it just a linear operator?<br>
> > ><br>
> > > Again, if you provide an approximate Jacobian like above you can use it with BoomerAMG, if you provide NO explicit matrix you cannot use BoomerAMG or almost any other preconditioner.<br>
> > ><br>
> > > Barry<br>
> > ><br>
> > > ><br>
> > > > Thanks,<br>
> > > > Mark<br>
> > ><br>
> > ><br>
> > ><br>
> ><br>
> ><br>
><br>
><br>
<br>
</div></div></blockquote></div><br></div></div>