[petsc-dev] An important optimization/generalization of TS

Tue Apr 27 08:23:17 CDT 2010

TS tutorial ex10 illustrates a challenge that TS doesn't currently
address in a reasonable way.  This example has two degrees of freedom
per node, radiation energy E and material temperature T.  The system is

  E_t - div(D(T,E,grad E) grad E) - (T^4 - E) = 0
  E_m(T)_t + T^4 - E = 0

The second equation is just an ODE (no spatial derivatives), but it has
the difficulty that the material energy E_m is a nontrivial function of
T (an ionization model).  We can differentiate through E_m to get

  E_m'(T) T_t + T^4 - E = 0

which is the standard implicit form, but this contributes a lot of
stiffness and seems to produce positivity problems.  Clearly the first
form is desirable.  One solution is to turn the system into a DAE by
introducing new variables Y and perhaps Z in which the equation becomes

  Y - E_m(T) = 0
  Y_t + T^4 - E = 0

or

  Y_t     - Z = 0
  E_m(T)  - Y = 0
  Z + T^4 - E = 0

These new variables can be algebraically eliminated from the resulting
system, thus it's introduction does not necessarily change the spectral
properties or storage costs of the resulting operator.  I'm debating how
to expose this functionality to the user.

Sundials' IDA does not seem to have any special support for such
systems.  Adrian, could you explain what KPP does here?  The user's
manual (section 4.1.10) indicates that the system is reduced to a
standard-form ODE x_t = f(t,x).  Maybe this goes with the product always
being a linear combination of specias, but how would the method
generalize to a case where the products are a nonlinear combination?

The RADAU code from Hairer & Wanner requires all systems to have a
linear term associated with the time derivative (i.e. a linear mass
matrix), but has special treatment for the case where some equations
have the form x[i]_t = y[i] (exploiting this is the reason for
introducing the second form above).  I asked Ernst Hairer about use of
the linear mass matrix, here is his response:

  You ask me why I require the matrix M to be constant.

  1. The formulations MY_t = F(t,Y) and f(t,x,x_t) = 0 are equivalent by
      considering x_t as a new variable and be letting Y= (x,x_t).  If
      implemented correctly, also the numerical result will be the same.
      In conclusion, everything that you can do with one formulation of
      the problem, you can do it also in the other formulation.

  2. My personal preference is the first form. Most of the problems that I
     encountered are of this form. The fact that M is constant, makes the
     implemention easier (at least for me). In this formulation it is also
     easier to distinguish between index 1, index 2, and index 3
     components, and it is easy to make an appropriate scaling for an
     improved convergence of the Newton iterations. Also theoretical
     investigations (convergence of methods, ...) seem to me more natural
     in the concept of the first formulation.

  3. The same answer applies to your comment "... without any explicit
     presence of z, which I think produces a more natural formulation, at
     least in the context of parallel iterative solvers". Both
     formulations are equivalent, and also the implementation can be done
     in the same way. The introduction of z is just giving a name to a
     certain expression. Exploiting the structure of the augmented
     equation, this introduction does not produce any additional overhead.

Note that we currently make no assumption of a linear mass matrix, I do
not think this complicates the implementation at all.  In our context,
it's common for people to need sophisticated preconditioners for
transformed problems.  If the integrator is doing algebraic
manipulations to eliminate inessential variables, then the actual
structure of the resulting system becomes somewhat opaque to the user
and it would appear to be difficult to use unassembled representations
or user-defined preconditioners.

At this point I'm favoring the modest generalization

  f(t,x,z(x)_t) = 0

where the user provides the function z(.), with the identity used by
default.  The range of z(.) may be of different size than the domain and
is not even required to be local.  The stage equations are essentially a
set of equations defining z(x_i)_t in terms of z(x_j), their solution
produces a value of x_i.  The moments z(x), h z(x)_t, h^2 z(x)_{tt},
etc, are propagated.  Unless I'm mistaken, only the zeroth moment of the
state (x) is needed, though the first moment is convenient if attempting
to start the Newton iteration with extrapolation.  The stage equations
have the form

  g(x) = f(t,x,z_0 + a z(x)) = 0

for some z_0 and a determined by the method and step size.  The Jacobian
of g is actually equivalent to the Jacobian that would be required if
the method was converted to the standard form f(t,x,x_t)=0.  Such
transparency in the structure of the Jacobian is a very high priority to
me.

Comments?  Am I losing some important analysis tools (especially for
higher index DAE) by performing the nonlinear elimination too early?

Jed