[petsc-dev] Proposed changes to TS API
hongzhang at anl.gov
Fri May 11 10:29:49 CDT 2018
Before we go down the rabbit hole, let me reiterate the primary point: an unfriendly API breaks the deal in the first place. Perhaps we should reflect on why many other software use PETSc just as a nonlinear/linear solver and implement their own time stepper instead of using TS. FWIW I think the weird IJacobian with shift is not user-friendly.
On May 11, 2018, at 7:20 AM, Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>> wrote:
"Smith, Barry F." <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> writes:
On May 10, 2018, at 4:12 PM, Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>> wrote:
"Zhang, Hong" <hongzhang at anl.gov<mailto:hongzhang at anl.gov>> writes:
Dear PETSc folks,
Current TS APIs (IFunction/IJacobian+RHSFunction/RHSJacobian) were designed for the fully implicit formulation F(t,U,Udot) = G(t,U).
Shampine's paper (https://www.sciencedirect.com/science/article/pii/S0377042706004110?via%3Dihub<https://www.sciencedirect.com/science/article/pii/S0377042706004110?via=ihub>) explains some reasoning behind it.
Our formulation is general enough to cover all the following common cases
* Udot = G(t,U) (classic ODE)
* M Udot = G(t,U) (ODEs/DAEs for mechanical and electronic systems)
* M(t,U) Udot = G(t,U) (PDEs)
Yet the TS APIs provide the capability to solve both simple problems and complicated problems. However, we are not doing well to make TS easy to use and efficient especially for simple problems. Over the years, we have clearly seen the main drawbacks including:
1. The shift parameter exposed in IJacobian is terribly confusing, especially to new users. Also it is not conceptually straightforward when using AD or finite differences on IFunction to approximate IJacobian.
What isn't straightforward about AD or FD on the IFunction? That one
bit of chain rule?
2. It is difficult to switch from fully implicit to fully explicit. Users cannot use explicit methods when they provide IFunction/IJacobian.
This is a real issue, but it's extremely common for PDE to have boundary
conditions enforced as algebraic constraints, thus yielding a DAE.
3. The structure of mass matrix is completely invisible to TS. This means giving up all the opportunities to improve efficiency. For example, when M is constant or weekly dependent on U, we might not want to evaluate/update it every time IJacobian is called. If M is diagonal, the Jacobian can be shifted more efficiently than just using MatAXPY().
I don't understand
4. Reshifting the Jacobian is unnecessarily expensive and sometimes buggy.
Why is "reshifting" needed? After a step is rejected and when the step
size changes for a linear constant operator?
Consider the scenario below.
shift = a;
shift = b;
TSComputeIJacobian() // with the same U and Udot as last call
Changing the shift parameter requires the Jacobian function to be evaluated again. If users provide only RHSJacobian, the Jacobian will not be updated/reshifted in the second call because TSComputeRHSJacobian() finds out that U has not been changed. This issue is fixable by adding more logic into the already convoluted implementation of TSComputeIJacobian(), but the intention here is to illustrate the cumbersomeness of current IJacobian and the growing complications in TSComputeIJacobian() that IJacobian causes.
So I propose that we have two separate matrices dF/dUdot and dF/dU, and remove the shift parameter from IJacobian. dF/dU will be calculated by IJacobian; dF/dUdot will be calculated by a new callback function and default to an identity matrix if it is not provided by users. Then the users do not need to assemble the shifted Jacobian since TS will handle the shifting whenever needed. And knowing the structure of dF/dUdot (the mass matrix), TS will become more flexible. In particular, we will have
* easy access to the unshifted Jacobian dF/dU (this simplifies the adjoint implementation a lot),
How does this simplify the adjoint?
* plenty of opportunities to optimize TS when the mass matrix is diagonal or constant or weekly dependent on U (which accounts for almost all use cases in practice),
But longer critical path,
What do you mean by longer critical path?
Create Mass (dF/dUdot) matrix, call MatAssembly, create dF/dU, call
MatAssembly, call MatAXPY (involves another MatAssembly unless
SAME_NONZERO_PATTERN). That's a long sequence for what could be one
MatAssembly. Also note that geometric setup for elements is usually
done in each element loop. For simple physics, this is way more
expensive than the physics (certainly the case for LibMesh and Deal.II).
So the benefit of asking users to provide the shifted Jacobian is that they could use less MatAssembly inside IJacobian. But what I proposed aims for the benefit of reducing the number of Jacobian or Mass matrix evaluations.
Consider how we handle the following simple cases in the new approach:
1. Udot = G(t,U) -- users do not need to provide dF/dUdot, no assembly is needed for Mass matrix
2. M Udot = G(t,U) -- the constant Mass matrix can be evaluated just once, we do not assemble or evaluate M in each call to IJacobian
For cases M(t,U) Udot = G(t,U) where Mass matrix strongly depends on the state U, we still have opportunities to optimize based on the feature of M(t,U). If M is diagonal (99% of the cases in practice), we can use MatShift or friends, which could be much faster than MatAXPY.
more storage required, and more data motion.
The extra storage needed is related to the size of the mass matrix correct? And the extra data motion is related to the size of the mass matrix correct?
Yes, which is the same as the stiffness matrix for finite element methods.
Is the extra work coming from a needed call to MatAXPY (to combine the scaled mass matrix with the Jacobian) in Hong's approach? While, in theory, the user can avoid the MatAXPY in the current code if they have custom code that assembles directly the scaled mass matrix and Jacobian? But surely most users would not write such custom code and would themselves keep a copy of the mass matrix (likely constant) and use MatAXPY() to combine the copy of the mass matrix with the Jacobian they compute at each timestep/stage? Or am I missing something?
It's better (and at least as convenient) to write code that assembles
into the mass matrix (at the element scale if you want to amortize mesh
traversal, but can also be a new traversal without needing extra
MatAssembly). Then instead of MatAXPY(), you call the code that
ADD_VALUES the mass part. I think storing the mass matrix away
somewhere is a false economy in almost all cases.
Mass matrix can be made matrix-free.
There is also the issue that matrix-free preconditioning is much more
confusing with the new proposed scheme. As it is now, the matrix needed
by the solver is specified and the user can choose how to approximate
it. If only the pieces are specified, then a PCShell will need to
understand the result of a MatAXPY with shell matrices.
The current IJacobian is more oriented for SNES, instead of TS. Conceptually it is the Jacobian/preconditioner for SNES, which causes grief to users who are new to TS. I think we should pay more attention to TS-level concerns such as taking advantage of the structure Mass matrix, reusing Jacobian across stages or timesteps and adapting stepsize more efficiently, which cannot be easily achieved with the cumbersome IJacobian.
And if the mass matrix is simple, won't it take a very small fraction of
time, thus have little gains from "optimizing it"?
Is the Function approach only theoretically much more efficient than Hong's approach when the mass matrix is nontrivial? That is the mass matrix has a nonzero structure similar to the Jacobian?
The extra MatAssembly is called even for MatAXPY with
SUBSET_NONZERO_PATTERN. But apart from strong scaling concerns like
that extra communication (which could be optimized, but there are
several formats to optimize) any system should be sufficiently fast if
the mass matrix is trivial because that means it has much less work than
the dF/dU matrix.
As I mentioned above, if the Mass matrix is trivial (identity or constant or diagonal), we clearly have better ways to address it than hiding it in IJacobian and expecting a smart user custom code.
* easy conversion from fully implicit to fully explicit,
* an IJacobian without shift parameter that is easy to understand and easy to port to other software.
Note that even CVODE has an interface similar to PETSc; e.g., gamma
parameter in CVSpilsPrecSetupFn.
Regarding the changes on the user side, most IJacobian users should not have problem splitting the old Jacobian if they compute dF/dUdot and dF/dU explicitly. If dF/dUdot is too complicated to build, matrix-free is an alternative option.
While this proposal is somehow related to Barry's idea of having a TSSetMassMatrix() last year, I hope it provides more details for your information. Any of your comments and feedback would be appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the petsc-dev