[petsc-users] Python PETSc performance vs scipy ZVODE
Niclas Götting
ngoetting at itp.uni-bremen.de
Thu Aug 10 04:40:04 CDT 2023
Thank you both for the very quick answer!
So far, I compiled PETSc with debugging turned on, but I think it should
still be faster than standard scipy in both cases. Actually, Stefano's
answer has got me very far already; now I only define the RHS of the ODE
and no Jacobian (I wonder, why the documentation suggests otherwise,
though). I had the following four tries at implementing the RHS:
1. def rhsfunc1(ts, t, u, F):
scale = 0.5 * (5 < t < 10)
(l + scale * pump).mult(u, F)
2. def rhsfunc2(ts, t, u, F):
l.mult(u, F)
scale = 0.5 * (5 < t < 10)
(scale * pump).multAdd(u, F, F)
3. def rhsfunc3(ts, t, u, F):
l.mult(u, F)
scale = 0.5 * (5 < t < 10)
if scale != 0:
pump.scale(scale)
pump.multAdd(u, F, F)
pump.scale(1/scale)
4. def rhsfunc4(ts, t, u, F):
tmp_pump.zeroEntries() # tmp_pump is pump.duplicate()
l.mult(u, F)
scale = 0.5 * (5 < t < 10)
tmp_pump.axpy(scale, pump,
structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)
tmp_pump.multAdd(u, F, F)
They all yield the same results, but with 50it/s, 800it/, 2300it/s and
1900it/s, respectively, which is a huge performance boost (almost 7
times as fast as scipy, with PETSc debugging still turned on). As the
scale function will most likely be a gaussian in the future, I think
that option 3 will be become numerically unstable and I'll have to go
with option 4, which is already faster than I expected. If you think it
is possible to speed up the RHS calculation even more, I'd be happy to
hear your suggestions; the -log_view is attached to this message.
One last point: If I didn't misunderstand the documentation at
https://petsc.org/release/manual/ts/#special-cases, should this maybe be
changed?
Best regards
Niclas
On 09.08.23 17:51, Stefano Zampini wrote:
> TSRK is an explicit solver. Unless you are changing the ts type from
> command line, the explicit jacobian should not be needed. On top of
> Barry's suggestion, I would suggest you to write the explicit RHS
> instead of assembly a throw away matrix every time that function needs
> to be sampled.
>
> On Wed, Aug 9, 2023, 17:09 Niclas Götting
> <ngoetting at itp.uni-bremen.de> wrote:
>
> Hi all,
>
> I'm currently trying to convert a quantum simulation from scipy to
> PETSc. The problem itself is extremely simple and of the form
> \dot{u}(t)
> = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test
> case is
> a square function. The matrices A_const and B_const are extremely
> sparse
> and therefore I thought, the problem will be well suited for PETSc.
> Currently, I solve the ODE with the following procedure in scipy
> (I can
> provide the necessary data files, if needed, but they are just some
> trace-preserving, very sparse matrices):
>
> import numpy as np
> import scipy.sparse
> import scipy.integrate
>
> from tqdm import tqdm
>
>
> l = np.load("../liouvillian.npy")
> pump = np.load("../pump_operator.npy")
> state = np.load("../initial_state.npy")
>
> l = scipy.sparse.csr_array(l)
> pump = scipy.sparse.csr_array(pump)
>
> def f(t, y, *args):
> return (l + 0.5 * (5 < t < 10) * pump) @ y
> #return l @ y # Uncomment for f(t) = 0
>
> dt = 0.1
> NUM_STEPS = 200
> res = np.empty((NUM_STEPS, 4096), dtype=np.complex128)
> solver =
> scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)
> times = []
> for i in tqdm(range(NUM_STEPS)):
> res[i, :] = solver.integrate(solver.t + dt)
> times.append(solver.t)
>
> Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports
> about 330it/s on my machine. When converting the code to PETSc, I
> came
> to the following result (according to the chapter
> https://petsc.org/main/manual/ts/#special-cases)
>
> import sys
> import petsc4py
> petsc4py.init(args=sys.argv)
> import numpy as np
> import scipy.sparse
>
> from tqdm import tqdm
> from petsc4py import PETSc
>
> comm = PETSc.COMM_WORLD
>
>
> def mat_to_real(arr):
> return np.block([[arr.real, -arr.imag], [arr.imag,
> arr.real]]).astype(np.float64)
>
> def mat_to_petsc_aij(arr):
> arr_sc_sp = scipy.sparse.csr_array(arr)
> mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm)
> rstart, rend = mat.getOwnershipRange()
> print(rstart, rend)
> print(arr.shape[0])
> print(mat.sizes)
> I = arr_sc_sp.indptr[rstart : rend + 1] -
> arr_sc_sp.indptr[rstart]
> J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] :
> arr_sc_sp.indptr[rend]]
> V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] :
> arr_sc_sp.indptr[rend]]
>
> print(I.shape, J.shape, V.shape)
> mat.setValuesCSR(I, J, V)
> mat.assemble()
> return mat
>
>
> l = np.load("../liouvillian.npy")
> l = mat_to_real(l)
> pump = np.load("../pump_operator.npy")
> pump = mat_to_real(pump)
> state = np.load("../initial_state.npy")
> state = np.hstack([state.real, state.imag]).astype(np.float64)
>
> l = mat_to_petsc_aij(l)
> pump = mat_to_petsc_aij(pump)
>
>
> jac = l.duplicate()
> for i in range(8192):
> jac.setValue(i, i, 0)
> jac.assemble()
> jac += l
>
> vec = l.createVecRight()
> vec.setValues(np.arange(state.shape[0], dtype=np.int32), state)
> vec.assemble()
>
>
> dt = 0.1
>
> ts = PETSc.TS().create(comm=comm)
> ts.setFromOptions()
> ts.setProblemType(ts.ProblemType.LINEAR)
> ts.setEquationType(ts.EquationType.ODE_EXPLICIT)
> ts.setType(ts.Type.RK)
> ts.setRKType(ts.RKType.RK3BS)
> ts.setTime(0)
> print("KSP:", ts.getKSP().getType())
> print("KSP PC:",ts.getKSP().getPC().getType())
> print("SNES :", ts.getSNES().getType())
>
> def jacobian(ts, t, u, Amat, Pmat):
> Amat.zeroEntries()
> Amat.aypx(1, l,
> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)
> Amat.axpy(0.5 * (5 < t < 10), pump,
> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)
>
> ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)
> #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) #
> Uncomment for f(t) = 0
> ts.setRHSJacobian(jacobian, jac)
>
> NUM_STEPS = 200
> res = np.empty((NUM_STEPS, 8192), dtype=np.float64)
> times = []
> rstart, rend = vec.getOwnershipRange()
> for i in tqdm(range(NUM_STEPS)):
> time = ts.getTime()
> ts.setMaxTime(time + dt)
> ts.solve(vec)
> res[i, rstart:rend] = vec.getArray()[:]
> times.append(time)
>
> I decomposed the complex ODE into a larger real ODE, so that I can
> easily switch maybe to GPU computation later on. Now, the
> solutions of
> both scripts are very much identical, but PETSc runs about 3 times
> slower at 120it/s on my machine. I don't use MPI for PETSc yet.
>
> I strongly suppose that the problem lies within the jacobian
> definition,
> as PETSc is about 3 times *faster* than scipy with f(t) = 0 and
> therefore a constant jacobian.
>
> Thank you in advance.
>
> All the best,
> Niclas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230810/17c79e0a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log.log
Type: text/x-log
Size: 10183 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230810/17c79e0a/attachment-0001.bin>
More information about the petsc-users
mailing list