[petsc-users] Python PETSc performance vs scipy ZVODE

Niclas Götting ngoetting at itp.uni-bremen.de
Thu Aug 10 05:12:47 CDT 2023


If I understood you right, this should be the resulting RHS:

def rhsfunc5(ts, t, u, F):
     l.mult(u, F)
     pump.mult(u, tmp_vec)
     scale = 0.5 * (5 < t < 10)
     F.axpy(scale, tmp_vec)

It is a little bit slower than option 3, but with about 2100it/s 
consistently ~10% faster than option 4.

Thank you very much for the suggestion!

On 10.08.23 11:47, Stefano Zampini wrote:
> I would use option 3. Keep a work vector and do a vector summation 
> instead of the multiple multiplication by scale and 1/scale.
>
> I agree with you the docs are a little misleading here.
>
> On Thu, Aug 10, 2023, 11:40 Niclas Götting 
> <ngoetting at itp.uni-bremen.de> wrote:
>
>     Thank you both for the very quick answer!
>
>     So far, I compiled PETSc with debugging turned on, but I think it
>     should still be faster than standard scipy in both cases.
>     Actually, Stefano's answer has got me very far already; now I only
>     define the RHS of the ODE and no Jacobian (I wonder, why the
>     documentation suggests otherwise, though). I had the following
>     four tries at implementing the RHS:
>
>      1. def rhsfunc1(ts, t, u, F):
>             scale = 0.5 * (5 < t < 10)
>             (l + scale * pump).mult(u, F)
>      2. def rhsfunc2(ts, t, u, F):
>             l.mult(u, F)
>             scale = 0.5 * (5 < t < 10)
>             (scale * pump).multAdd(u, F, F)
>      3. def rhsfunc3(ts, t, u, F):
>             l.mult(u, F)
>             scale = 0.5 * (5 < t < 10)
>             if scale != 0:
>                 pump.scale(scale)
>                 pump.multAdd(u, F, F)
>                 pump.scale(1/scale)
>      4. def rhsfunc4(ts, t, u, F):
>             tmp_pump.zeroEntries() # tmp_pump is pump.duplicate()
>             l.mult(u, F)
>             scale = 0.5 * (5 < t < 10)
>             tmp_pump.axpy(scale, pump,
>         structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)
>             tmp_pump.multAdd(u, F, F)
>
>     They all yield the same results, but with 50it/s, 800it/, 2300it/s
>     and 1900it/s, respectively, which is a huge performance boost
>     (almost 7 times as fast as scipy, with PETSc debugging still
>     turned on). As the scale function will most likely be a gaussian
>     in the future, I think that option 3 will be become numerically
>     unstable and I'll have to go with option 4, which is already
>     faster than I expected. If you think it is possible to speed up
>     the RHS calculation even more, I'd be happy to hear your
>     suggestions; the -log_view is attached to this message.
>
>     One last point: If I didn't misunderstand the documentation at
>     https://petsc.org/release/manual/ts/#special-cases, should this
>     maybe be changed?
>
>     Best regards
>     Niclas
>
>     On 09.08.23 17:51, Stefano Zampini wrote:
>>     TSRK is an explicit solver. Unless you are changing the ts type
>>     from command line,  the explicit  jacobian should not be needed.
>>     On top of Barry's suggestion, I would suggest you to write the
>>     explicit RHS instead of assembly a throw away matrix every time
>>     that function needs to be sampled.
>>
>>     On Wed, Aug 9, 2023, 17:09 Niclas Götting
>>     <ngoetting at itp.uni-bremen.de> wrote:
>>
>>         Hi all,
>>
>>         I'm currently trying to convert a quantum simulation from
>>         scipy to
>>         PETSc. The problem itself is extremely simple and of the form
>>         \dot{u}(t)
>>         = (A_const + f(t)*B_const)*u(t), where f(t) in this simple
>>         test case is
>>         a square function. The matrices A_const and B_const are
>>         extremely sparse
>>         and therefore I thought, the problem will be well suited for
>>         PETSc.
>>         Currently, I solve the ODE with the following procedure in
>>         scipy (I can
>>         provide the necessary data files, if needed, but they are
>>         just some
>>         trace-preserving, very sparse matrices):
>>
>>         import numpy as np
>>         import scipy.sparse
>>         import scipy.integrate
>>
>>         from tqdm import tqdm
>>
>>
>>         l = np.load("../liouvillian.npy")
>>         pump = np.load("../pump_operator.npy")
>>         state = np.load("../initial_state.npy")
>>
>>         l = scipy.sparse.csr_array(l)
>>         pump = scipy.sparse.csr_array(pump)
>>
>>         def f(t, y, *args):
>>              return (l + 0.5 * (5 < t < 10) * pump) @ y
>>              #return l @ y # Uncomment for f(t) = 0
>>
>>         dt = 0.1
>>         NUM_STEPS = 200
>>         res = np.empty((NUM_STEPS, 4096), dtype=np.complex128)
>>         solver =
>>         scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)
>>         times = []
>>         for i in tqdm(range(NUM_STEPS)):
>>              res[i, :] = solver.integrate(solver.t + dt)
>>              times.append(solver.t)
>>
>>         Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm
>>         reports
>>         about 330it/s on my machine. When converting the code to
>>         PETSc, I came
>>         to the following result (according to the chapter
>>         https://petsc.org/main/manual/ts/#special-cases)
>>
>>         import sys
>>         import petsc4py
>>         petsc4py.init(args=sys.argv)
>>         import numpy as np
>>         import scipy.sparse
>>
>>         from tqdm import tqdm
>>         from petsc4py import PETSc
>>
>>         comm = PETSc.COMM_WORLD
>>
>>
>>         def mat_to_real(arr):
>>              return np.block([[arr.real, -arr.imag], [arr.imag,
>>         arr.real]]).astype(np.float64)
>>
>>         def mat_to_petsc_aij(arr):
>>              arr_sc_sp = scipy.sparse.csr_array(arr)
>>              mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm)
>>              rstart, rend = mat.getOwnershipRange()
>>              print(rstart, rend)
>>              print(arr.shape[0])
>>              print(mat.sizes)
>>              I = arr_sc_sp.indptr[rstart : rend + 1] -
>>         arr_sc_sp.indptr[rstart]
>>              J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] :
>>         arr_sc_sp.indptr[rend]]
>>              V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] :
>>         arr_sc_sp.indptr[rend]]
>>
>>              print(I.shape, J.shape, V.shape)
>>              mat.setValuesCSR(I, J, V)
>>              mat.assemble()
>>              return mat
>>
>>
>>         l = np.load("../liouvillian.npy")
>>         l = mat_to_real(l)
>>         pump = np.load("../pump_operator.npy")
>>         pump = mat_to_real(pump)
>>         state = np.load("../initial_state.npy")
>>         state = np.hstack([state.real, state.imag]).astype(np.float64)
>>
>>         l = mat_to_petsc_aij(l)
>>         pump = mat_to_petsc_aij(pump)
>>
>>
>>         jac = l.duplicate()
>>         for i in range(8192):
>>              jac.setValue(i, i, 0)
>>         jac.assemble()
>>         jac += l
>>
>>         vec = l.createVecRight()
>>         vec.setValues(np.arange(state.shape[0], dtype=np.int32), state)
>>         vec.assemble()
>>
>>
>>         dt = 0.1
>>
>>         ts = PETSc.TS().create(comm=comm)
>>         ts.setFromOptions()
>>         ts.setProblemType(ts.ProblemType.LINEAR)
>>         ts.setEquationType(ts.EquationType.ODE_EXPLICIT)
>>         ts.setType(ts.Type.RK)
>>         ts.setRKType(ts.RKType.RK3BS)
>>         ts.setTime(0)
>>         print("KSP:", ts.getKSP().getType())
>>         print("KSP PC:",ts.getKSP().getPC().getType())
>>         print("SNES :", ts.getSNES().getType())
>>
>>         def jacobian(ts, t, u, Amat, Pmat):
>>              Amat.zeroEntries()
>>              Amat.aypx(1, l,
>>         structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)
>>              Amat.axpy(0.5 * (5 < t < 10), pump,
>>         structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)
>>
>>         ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)
>>         #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) #
>>         Uncomment for f(t) = 0
>>         ts.setRHSJacobian(jacobian, jac)
>>
>>         NUM_STEPS = 200
>>         res = np.empty((NUM_STEPS, 8192), dtype=np.float64)
>>         times = []
>>         rstart, rend = vec.getOwnershipRange()
>>         for i in tqdm(range(NUM_STEPS)):
>>              time = ts.getTime()
>>              ts.setMaxTime(time + dt)
>>              ts.solve(vec)
>>              res[i, rstart:rend] = vec.getArray()[:]
>>              times.append(time)
>>
>>         I decomposed the complex ODE into a larger real ODE, so that
>>         I can
>>         easily switch maybe to GPU computation later on. Now, the
>>         solutions of
>>         both scripts are very much identical, but PETSc runs about 3
>>         times
>>         slower at 120it/s on my machine. I don't use MPI for PETSc yet.
>>
>>         I strongly suppose that the problem lies within the jacobian
>>         definition,
>>         as PETSc is about 3 times *faster* than scipy with f(t) = 0 and
>>         therefore a constant jacobian.
>>
>>         Thank you in advance.
>>
>>         All the best,
>>         Niclas
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230810/47eefa4d/attachment-0001.html>


More information about the petsc-users mailing list