[petsc-users] Action of inverse of Cholesky factors in parallel

Mon Dec 5 02:40:07 CST 2022

MUMPS does not support doing the forward or backward solve only. You should ask MUMPS developers to add an option for this, then we would be able to interface it in PETSc. Regarding the other Cholesky options (see https://petsc.org/release/overview/linear_solve_table/), PaStiX has the same problem as MUMPS; CHOLMOD is sequential; SuperLU_DIST implements LU so it would not be helpful for you if the forward/backward solves were available.

If you want to go the inner-product route, SLEPc provides some support via its BV object, including basis orthogonalization, see https://slepc.upv.es/documentation/current/docs/manualpages/BV/BVSetMatrix.html

Jose

> El 5 dic 2022, a las 8:26, Johannes Haubner <haubnerj at simula.no> escribió:
> 
> We are trying to write an interface to general purpose optimization solvers for PDE constrained optimal control problems in Hilbert spaces based on PETSc. Ipopt or the scipy implementation of L-BFGS work with the Euclidean inner product. For discretizations of PDE constrained optimization problems in Hilbert spaces one typically wants to work with inner products of the form x^T A x (A s.p.d, A typically not equal to the identity matrix).
> 
> Not working with the correct inner product typically leads to mesh dependent behavior of the optimization algorithm, which can be motivated by considering the optimization problem
> 
> min_{x in X} 1/2 ||x||^2_X  =: f(x)
> 
> where X denotes a Hilbert space. We choose x_0 != 0 and apply a gradient descent algorithm with step size 1.
> 
> Case 1: X = R^d, d in N, || x ||_X ^2 =  x^T x:
>              It holds nabla f(x) = x.
>              Applying gradient descent with step size 1: x_1 = x_0 - x_0 = 0.
>              Hence, we have convergence to the optimal solution in 1 iteration.
> 
> Case 2: X = H^1(Omega), Omega subset R^d, d in {2, 3},
>              (as a specific example for X being a infinite dimensional Hilbert space)
>              || x ||_X ^2 = int x(xi) x(xi) dxi + int nabla x(xi) . nabla x(xi) *dxi
>              Derivative of f(x) is an element of the dual space of X and given by
>              Df(x) (delta x) = int x (xi) delta_x(xi) dx + int nabla x(xi) . nabla delta_x(xi) *dxi
>              for delta_x in X
>              Gradient (Riesz representation of the derivative) is given by nabla f = x
>              Hence, gradient descent with step size 1: x_1 = x_0 - x_0 = 0
> 
> Case 3: Discretization of Case 2 using finite elements:
>              f(x) := 1/2 x^T (M + S) x, where M denotes the mass matrix, S the stiffness matrix,
>              and x the degrees of freedom of the discretization
>              Neglecting (by working with the Euclidean inner product) that x encodes a H^1-function
>              and just solving the discretized optimization problem with gradient descent with step size 1 yields
>              nabla f(x) = (M + S) x
>              and x_1 = x_0 - (M + S) x_0
>              where the condition number of (M + S) depends on the mesh size,
>              i.e. mesh size dependent convergence to optimal solution (and not in 1 iteration).
> 
> One can either respect the correct inner product (in the example x^T (M + S) x) in the implementation of the optimization algorithms (see e.g. https://github.com/funsim/moola) or by doing a "discrete hack". The latter consists of introducing a auxiliary variable y, applying a linear transformation y --> (L^T)^{-1} y =: x, where (M + S) = L L^T (that is why we need the action of the inverse of the Cholesky factors (L^{-1} y is needed for applying the chain rule when computing the derivative)), and optimizing for y instead of x.
> 
> Doing this trick gives for the above example:
> tilde f(y) := f(L^-T y) = 1/2 y^T L^{-1} (M + S) L^{-T} y = 1/2 y^T y
> and gradient descent in y with step size 1 yields y_1 = y_0 - y_0 = 0.
> Therefore, also x_1 = L^{-T} y_1 = 0 (i.e., convergence in 1 iteration).
> 
> Johannes
> 
>> On 2. Dec 2022, at 13:48, Jose E. Roman <jroman at dsic.upv.es> wrote:
>> 
>> As far as I know, the solveForward/Backward operations are not implemented for external packages.
>> 
>> What are you trying to do? One may be tempted to use the Cholesky decomposition to transform a symmetric-definite generalized eigenvalue problem into a standard eigenvalue problem, as is done e.g. in LAPACK https://netlib.org/lapack/lug/node54.html - But note that in SLEPc you do not need to do this, as EPSSolve() will take care of symmetry using Cholesky without having to apply solveForward/Backward separately.
>> 
>> Jose
>> 
>> 
>> 
>> 
>>> El 2 dic 2022, a las 13:32, Johannes Haubner <haubnerj at simula.no> escribió:
>>> 
>>> Given a s.p.d. matrix A and a right-hand-side b, we want to compute (L^T)^{-1}b and L^{-1}b, where A= L L^T is the Cholesky decomposition of A. In serial, we are able to do this using pc_factor_mat_solver_type petsc and using "solveForward" and "solveBackward" (see https://github.com/JohannesHaubner/petsc-cholesky/blob/main/main.py). In parallel, we are not able to do this yet. According to p. 52 https://slepc.upv.es/documentation/slepc.pdf , PETSc's Cholesky factorization is not parallel. However, using "mumps" or "superlu_dist" prevents us from using "solveForward" or "solveBackward". Is there a way of using solveBackward in parallel with a distributed matrix (MPI)?
>> 
>