[petsc-users] Action of inverse of Cholesky factors in parallel

Mon Dec 5 01:26:35 CST 2022

We are trying to write an interface to general purpose optimization solvers for PDE constrained optimal control problems in Hilbert spaces based on PETSc. Ipopt or the scipy implementation of L-BFGS work with the Euclidean inner product. For discretizations of PDE constrained optimization problems in Hilbert spaces one typically wants to work with inner products of the form x^T A x (A s.p.d, A typically not equal to the identity matrix).

Not working with the correct inner product typically leads to mesh dependent behavior of the optimization algorithm, which can be motivated by considering the optimization problem

min_{x in X} 1/2 ||x||^2_X  =: f(x)

where X denotes a Hilbert space. We choose x_0 != 0 and apply a gradient descent algorithm with step size 1.

Case 1: X = R^d, d in N, || x ||_X ^2 =  x^T x:
              It holds nabla f(x) = x.
              Applying gradient descent with step size 1: x_1 = x_0 - x_0 = 0.
              Hence, we have convergence to the optimal solution in 1 iteration.

Case 2: X = H^1(Omega), Omega subset R^d, d in {2, 3},
              (as a specific example for X being a infinite dimensional Hilbert space)
              || x ||_X ^2 = int x(xi) x(xi) dxi + int nabla x(xi) . nabla x(xi) *dxi
              Derivative of f(x) is an element of the dual space of X and given by
              Df(x) (delta x) = int x (xi) delta_x(xi) dx + int nabla x(xi) . nabla delta_x(xi) *dxi
              for delta_x in X
              Gradient (Riesz representation of the derivative) is given by nabla f = x
              Hence, gradient descent with step size 1: x_1 = x_0 - x_0 = 0

Case 3: Discretization of Case 2 using finite elements:
              f(x) := 1/2 x^T (M + S) x, where M denotes the mass matrix, S the stiffness matrix,
              and x the degrees of freedom of the discretization
              Neglecting (by working with the Euclidean inner product) that x encodes a H^1-function
              and just solving the discretized optimization problem with gradient descent with step size 1 yields
              nabla f(x) = (M + S) x
              and x_1 = x_0 - (M + S) x_0
              where the condition number of (M + S) depends on the mesh size,
              i.e. mesh size dependent convergence to optimal solution (and not in 1 iteration).

One can either respect the correct inner product (in the example x^T (M + S) x) in the implementation of the optimization algorithms (see e.g. https://github.com/funsim/moola) or by doing a "discrete hack". The latter consists of introducing a auxiliary variable y, applying a linear transformation y --> (L^T)^{-1} y =: x, where (M + S) = L L^T (that is why we need the action of the inverse of the Cholesky factors (L^{-1} y is needed for applying the chain rule when computing the derivative)), and optimizing for y instead of x.

Doing this trick gives for the above example:
tilde f(y) := f(L^-T y) = 1/2 y^T L^{-1} (M + S) L^{-T} y = 1/2 y^T y
and gradient descent in y with step size 1 yields y_1 = y_0 - y_0 = 0.
Therefore, also x_1 = L^{-T} y_1 = 0 (i.e., convergence in 1 iteration).

Johannes

> On 2. Dec 2022, at 13:48, Jose E. Roman <jroman at dsic.upv.es> wrote:
> 
> As far as I know, the solveForward/Backward operations are not implemented for external packages.
> 
> What are you trying to do? One may be tempted to use the Cholesky decomposition to transform a symmetric-definite generalized eigenvalue problem into a standard eigenvalue problem, as is done e.g. in LAPACK https://netlib.org/lapack/lug/node54.html - But note that in SLEPc you do not need to do this, as EPSSolve() will take care of symmetry using Cholesky without having to apply solveForward/Backward separately.
> 
> Jose
> 
> 
> 
> 
>> El 2 dic 2022, a las 13:32, Johannes Haubner <haubnerj at simula.no> escribió:
>> 
>> Given a s.p.d. matrix A and a right-hand-side b, we want to compute (L^T)^{-1}b and L^{-1}b, where A= L L^T is the Cholesky decomposition of A. In serial, we are able to do this using pc_factor_mat_solver_type petsc and using "solveForward" and "solveBackward" (see https://github.com/JohannesHaubner/petsc-cholesky/blob/main/main.py). In parallel, we are not able to do this yet. According to p. 52 https://slepc.upv.es/documentation/slepc.pdf , PETSc's Cholesky factorization is not parallel. However, using "mumps" or "superlu_dist" prevents us from using "solveForward" or "solveBackward". Is there a way of using solveBackward in parallel with a distributed matrix (MPI)?
>