<div dir="auto">I would use option 3. Keep a work vector and do a vector summation instead of the multiple multiplication by scale and 1/scale. <div dir="auto"><br></div><div dir="auto">I agree with you the docs are a little misleading here. </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023, 11:40 Niclas Götting <<a href="mailto:ngoetting@itp.uni-bremen.de">ngoetting@itp.uni-bremen.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><u></u>

  <div>

    <p>Thank you both for the very quick answer!</p>

    <p>So far, I compiled PETSc with debugging turned on, but I think it

      should still be faster than standard scipy in both cases.

      Actually, Stefano's answer has got me very far already; now I only

      define the RHS of the ODE and no Jacobian (I wonder, why the

      documentation suggests otherwise, though). I had the following

      four tries at implementing the RHS:</p>

    <ol>

      <li><font face="monospace">def rhsfunc1(ts, t, u, F):<br>

              scale = 0.5 * (5 < t < 10)<br>

              (l + scale * pump).mult(u, F)</font></li>

      <li><font face="monospace">def rhsfunc2(ts, t, u, F):<br>

              l.mult(u, F)<br>

              scale = 0.5 * (5 < t < 10)<br>

              (scale * pump).multAdd(u, F, F)</font></li>

      <li><font face="monospace">def rhsfunc3(ts, t, u, F):<br>

              l.mult(u, F)<br>

              scale = 0.5 * (5 < t < 10)<br>

              if scale != 0:<br>

                  pump.scale(scale)<br>

                  pump.multAdd(u, F, F)<br>

                  pump.scale(1/scale)</font></li>

      <li><font face="monospace">def rhsfunc4(ts, t, u, F):<br>

              tmp_pump.zeroEntries() # tmp_pump is pump.duplicate()<br>

              l.mult(u, F)<br>

              scale = 0.5 * (5 < t < 10)<br>

              tmp_pump.axpy(scale, pump,

          structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)<br>

              tmp_pump.multAdd(u, F, F)<br>

        </font></li>

    </ol>

    <p>They all yield the same results, but with 50it/s, 800it/,

      2300it/s and 1900it/s, respectively, which is a huge performance

      boost (almost 7 times as fast as scipy, with PETSc debugging still

      turned on). As the scale function will most likely be a gaussian

      in the future, I think that option 3 will be become numerically

      unstable and I'll have to go with option 4, which is already

      faster than I expected. If you think it is possible to speed up

      the RHS calculation even more, I'd be happy to hear your

      suggestions; the -log_view is attached to this message.</p>

    <p>One last point: If I didn't misunderstand the documentation at

      <a href="https://petsc.org/release/manual/ts/#special-cases" target="_blank" rel="noreferrer">https://petsc.org/release/manual/ts/#special-cases</a>, should this

      maybe be changed?</p>

    <p>Best regards<br>

      Niclas<br>

    </p>

    <div>On 09.08.23 17:51, Stefano Zampini

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="auto">

        <div>TSRK is an explicit solver. Unless you are changing the ts

          type from command line,  the explicit  jacobian should not be

          needed. On top of Barry's suggestion, I would suggest you to

          write the explicit RHS instead of assembly a throw away matrix

          every time that function needs to be sampled.<br>

          <br>

          <div class="gmail_quote">

            <div dir="ltr" class="gmail_attr">On Wed, Aug 9, 2023, 17:09

              Niclas Götting <<a href="mailto:ngoetting@itp.uni-bremen.de" target="_blank" rel="noreferrer">ngoetting@itp.uni-bremen.de</a>>

              wrote:<br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi

              all,<br>

              <br>

              I'm currently trying to convert a quantum simulation from

              scipy to <br>

              PETSc. The problem itself is extremely simple and of the

              form \dot{u}(t) <br>

              = (A_const + f(t)*B_const)*u(t), where f(t) in this simple

              test case is <br>

              a square function. The matrices A_const and B_const are

              extremely sparse <br>

              and therefore I thought, the problem will be well suited

              for PETSc. <br>

              Currently, I solve the ODE with the following procedure in

              scipy (I can <br>

              provide the necessary data files, if needed, but they are

              just some <br>

              trace-preserving, very sparse matrices):<br>

              <br>

              import numpy as np<br>

              import scipy.sparse<br>

              import scipy.integrate<br>

              <br>

              from tqdm import tqdm<br>

              <br>

              <br>

              l = np.load("../liouvillian.npy")<br>

              pump = np.load("../pump_operator.npy")<br>

              state = np.load("../initial_state.npy")<br>

              <br>

              l = scipy.sparse.csr_array(l)<br>

              pump = scipy.sparse.csr_array(pump)<br>

              <br>

              def f(t, y, *args):<br>

                   return (l + 0.5 * (5 < t < 10) * pump) @ y<br>

                   #return l @ y # Uncomment for f(t) = 0<br>

              <br>

              dt = 0.1<br>

              NUM_STEPS = 200<br>

              res = np.empty((NUM_STEPS, 4096), dtype=np.complex128)<br>

              solver = <br>

scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)<br>

              times = []<br>

              for i in tqdm(range(NUM_STEPS)):<br>

                   res[i, :] = solver.integrate(solver.t + dt)<br>

                   times.append(solver.t)<br>

              <br>

              Here, A_const = l, B_const = pump and f(t) = 5 < t <

              10. tqdm reports <br>

              about 330it/s on my machine. When converting the code to

              PETSc, I came <br>

              to the following result (according to the chapter <br>

              <a href="https://petsc.org/main/manual/ts/#special-cases" rel="noreferrer noreferrer noreferrer" target="_blank">https://petsc.org/main/manual/ts/#special-cases</a>)<br>

              <br>

              import sys<br>

              import petsc4py<br>

              petsc4py.init(args=sys.argv)<br>

              import numpy as np<br>

              import scipy.sparse<br>

              <br>

              from tqdm import tqdm<br>

              from petsc4py import PETSc<br>

              <br>

              comm = PETSc.COMM_WORLD<br>

              <br>

              <br>

              def mat_to_real(arr):<br>

                   return np.block([[arr.real, -arr.imag], [arr.imag, <br>

              arr.real]]).astype(np.float64)<br>

              <br>

              def mat_to_petsc_aij(arr):<br>

                   arr_sc_sp = scipy.sparse.csr_array(arr)<br>

                   mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm)<br>

                   rstart, rend = mat.getOwnershipRange()<br>

                   print(rstart, rend)<br>

                   print(arr.shape[0])<br>

                   print(mat.sizes)<br>

                   I = arr_sc_sp.indptr[rstart : rend + 1] -

              arr_sc_sp.indptr[rstart]<br>

                   J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : <br>

              arr_sc_sp.indptr[rend]]<br>

                   V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] :

              arr_sc_sp.indptr[rend]]<br>

              <br>

                   print(I.shape, J.shape, V.shape)<br>

                   mat.setValuesCSR(I, J, V)<br>

                   mat.assemble()<br>

                   return mat<br>

              <br>

              <br>

              l = np.load("../liouvillian.npy")<br>

              l = mat_to_real(l)<br>

              pump = np.load("../pump_operator.npy")<br>

              pump = mat_to_real(pump)<br>

              state = np.load("../initial_state.npy")<br>

              state = np.hstack([state.real,

              state.imag]).astype(np.float64)<br>

              <br>

              l = mat_to_petsc_aij(l)<br>

              pump = mat_to_petsc_aij(pump)<br>

              <br>

              <br>

              jac = l.duplicate()<br>

              for i in range(8192):<br>

                   jac.setValue(i, i, 0)<br>

              jac.assemble()<br>

              jac += l<br>

              <br>

              vec = l.createVecRight()<br>

              vec.setValues(np.arange(state.shape[0], dtype=np.int32),

              state)<br>

              vec.assemble()<br>

              <br>

              <br>

              dt = 0.1<br>

              <br>

              ts = PETSc.TS().create(comm=comm)<br>

              ts.setFromOptions()<br>

              ts.setProblemType(ts.ProblemType.LINEAR)<br>

              ts.setEquationType(ts.EquationType.ODE_EXPLICIT)<br>

              ts.setType(ts.Type.RK)<br>

              ts.setRKType(ts.RKType.RK3BS)<br>

              ts.setTime(0)<br>

              print("KSP:", ts.getKSP().getType())<br>

              print("KSP PC:",ts.getKSP().getPC().getType())<br>

              print("SNES :", ts.getSNES().getType())<br>

              <br>

              def jacobian(ts, t, u, Amat, Pmat):<br>

                   Amat.zeroEntries()<br>

                   Amat.aypx(1, l,

              structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>

                   Amat.axpy(0.5 * (5 < t < 10), pump, <br>

              structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>

              <br>

              ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)<br>

              #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l,

              l) # <br>

              Uncomment for f(t) = 0<br>

              ts.setRHSJacobian(jacobian, jac)<br>

              <br>

              NUM_STEPS = 200<br>

              res = np.empty((NUM_STEPS, 8192), dtype=np.float64)<br>

              times = []<br>

              rstart, rend = vec.getOwnershipRange()<br>

              for i in tqdm(range(NUM_STEPS)):<br>

                   time = ts.getTime()<br>

                   ts.setMaxTime(time + dt)<br>

                   ts.solve(vec)<br>

                   res[i, rstart:rend] = vec.getArray()[:]<br>

                   times.append(time)<br>

              <br>

              I decomposed the complex ODE into a larger real ODE, so

              that I can <br>

              easily switch maybe to GPU computation later on. Now, the

              solutions of <br>

              both scripts are very much identical, but PETSc runs about

              3 times <br>

              slower at 120it/s on my machine. I don't use MPI for PETSc

              yet.<br>

              <br>

              I strongly suppose that the problem lies within the

              jacobian definition, <br>

              as PETSc is about 3 times *faster* than scipy with f(t) =

              0 and <br>

              therefore a constant jacobian.<br>

              <br>

              Thank you in advance.<br>

              <br>

              All the best,<br>

              Niclas<br>

              <br>

              <br>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

  </div>

</blockquote></div>