I assemble 12 matrices (for each dimension : viscous and inviscid fluxes (2 matrices for the dependency on the flow field and the stresses), viscous stresses, inviscid BCs (x2 as they depend on the flow field and its derivative)). There is no linear or non-linear solve.<br>
<br>I want an assembled matrix for two reasons. First, it allows me to linearize automatically the operators (by using a variation of the variable of ~1e-8: as all the matrices correspond to functions local to each discretization points, this can be done with as many function evaluations as DOFs per discretization point) without doing it "on the fly", and second it allows to take the adjoint of the linear operator.<br>
<br><br><br><div class="gmail_quote">On Sat, Mar 10, 2012 at 6:21 PM, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_quote"><div class="im">On Sat, Mar 10, 2012 at 11:05, Xavier Garnaud <span dir="ltr"><<a href="mailto:xavier.garnaud@ladhyx.polytechnique.fr" target="_blank">xavier.garnaud@ladhyx.polytechnique.fr</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I am using an explicit time stepper. The matrices are assembled only once, and then I use the linear operator for example to compute the least stable eigenmode(s). I attached the output of log_summary for performing the same number of time steps using the linear and nonlinear operators.</blockquote>
<div><br></div></div><div>Do you assemble more than one matrix as part of defining its action? I ask because there is about 3 times more VecScatterBegin/Ends for the linear version (although they send the same amount of data, so some calls don't do any communication).</div>
<div><br></div><div>I don't see anything here indicating an implicit solve, just TSFunctionEval. If TS did an implicit solve, there should be SNES/KSP/PC events.</div><div><br></div><div>Why do you want an assembled matrix? The matrix uses more memory, so if your nonlinear function evaluation is efficient, it may well be faster to evaluate than to multiply by the matrix.</div>
<div class="im">
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div></div><div><br>
<br><br><div class="gmail_quote">On Sat, Mar 10, 2012 at 5:10 PM, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div><div class="gmail_quote">On Sat, Mar 10, 2012 at 09:59, Xavier Garnaud <span dir="ltr"><<a href="mailto:xavier.garnaud@ladhyx.polytechnique.fr" target="_blank">xavier.garnaud@ladhyx.polytechnique.fr</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>am solving the compressible Navier--Stokes equations in compressible form, so in order to apply the operator, I<br><ol><li>apply BCs on the flow field</li><li>compute the flux</li><li>take the derivative using finite differences</li>
<li>apply BCs on the derivatives of the flux<br></li></ol><br>In order to apply the linearized operator, I wish to linearize steps 2 and 4 (the other are linear). For this I assemble sparse matrices (MPIAIJ). The matrices should be block diagonal -- with square or rectangular blocks -- so I preallocate the whole diagonal blocks (but I only use MatSetValues for nonzero entries). When I do this, the linearized code runs approximately 50% slower (the computation of derivatives takes more that 70% of the time in the non-linear code), so steps 2 and 4 are much slower for the linear operator although the number of operations is very similar. Is this be due to the poor preallocation? Is there a way to improve the performance?</div>
</blockquote></div><br></div></div><div>It's not clear to me from this description if you are even using an implicit method. Is the linearization for use in a Newton iteration? How often do you have to reassemble? Please always send -log_summary output with performance questions.</div>
</blockquote></div><br>
</div></div></blockquote></div></div><br>
</blockquote></div><br>