[petsc-users] performance issue

Sat Mar 10 11:21:48 CST 2012

On Sat, Mar 10, 2012 at 11:05, Xavier Garnaud <
xavier.garnaud at ladhyx.polytechnique.fr> wrote:

> I am using an explicit time stepper. The matrices are assembled only once,
> and then I use the linear operator for example to compute the least stable
> eigenmode(s). I attached the output of log_summary for performing the same
> number of time steps using the linear and nonlinear operators.

Do you assemble more than one matrix as part of defining its action? I ask
because there is about 3 times more VecScatterBegin/Ends for the linear
version (although they send the same amount of data, so some calls don't do
any communication).

I don't see anything here indicating an implicit solve, just
TSFunctionEval. If TS did an implicit solve, there should be SNES/KSP/PC
events.

Why do you want an assembled matrix? The matrix uses more memory, so if
your nonlinear function evaluation is efficient, it may well be faster to
evaluate than to multiply by the matrix.

>
>
>
> On Sat, Mar 10, 2012 at 5:10 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
>> On Sat, Mar 10, 2012 at 09:59, Xavier Garnaud <
>> xavier.garnaud at ladhyx.polytechnique.fr> wrote:
>>
>>> am solving the compressible Navier--Stokes equations in compressible
>>> form, so in order to apply the operator, I
>>>
>>>    1. apply BCs on the flow field
>>>    2. compute the flux
>>>    3. take the derivative using finite differences
>>>    4. apply BCs on the derivatives of the flux
>>>
>>>
>>> In order to apply the linearized operator, I wish to linearize steps 2
>>> and 4 (the other are linear). For this I assemble sparse matrices (MPIAIJ).
>>> The matrices should be block diagonal -- with square or rectangular blocks
>>> --  so I preallocate the whole diagonal blocks (but I only use MatSetValues
>>> for nonzero entries). When I do this, the linearized code runs
>>> approximately 50% slower (the computation of derivatives takes more that
>>> 70% of the time in the non-linear code), so steps 2 and 4 are much slower
>>> for the linear operator although the number of operations is very similar.
>>> Is this be due to the poor preallocation? Is there a way to improve the
>>> performance?
>>>
>>
>> It's not clear to me from this description if you are even using an
>> implicit method. Is the linearization for use in a Newton iteration? How
>> often do you have to reassemble? Please always send -log_summary output
>> with performance questions.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120310/4ae967eb/attachment.htm>