<div dir="auto">If you do the mult of "pump" inside an if it should be faster </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023, 12:12 Niclas Götting <<a href="mailto:ngoetting@itp.uni-bremen.de">ngoetting@itp.uni-bremen.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><u></u>

  
    
  
  <div>
    <p>If I understood you right, this should be the resulting RHS:</p>
    <p><font face="monospace">def rhsfunc5(ts, t, u, F):<br>
            l.mult(u, F)<br>
            pump.mult(u, tmp_vec)<br>
            scale = 0.5 * (5 < t < 10)<br>
            F.axpy(scale, tmp_vec)</font></p>
    <p>It is a little bit slower than option 3, but with about 2100it/s
      consistently ~10% faster than option 4.</p>
    <p>Thank you very much for the suggestion!<br>
    </p>
    <div>On 10.08.23 11:47, Stefano Zampini
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="auto">I would use option 3. Keep a work vector and do a
        vector summation instead of the multiple multiplication by scale
        and 1/scale. 
        <div dir="auto"><br>
        </div>
        <div dir="auto">I agree with you the docs are a little
          misleading here. </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023, 11:40
          Niclas Götting <<a href="mailto:ngoetting@itp.uni-bremen.de" target="_blank" rel="noreferrer">ngoetting@itp.uni-bremen.de</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div>
            <p>Thank you both for the very quick answer!</p>
            <p>So far, I compiled PETSc with debugging turned on, but I
              think it should still be faster than standard scipy in
              both cases. Actually, Stefano's answer has got me very far
              already; now I only define the RHS of the ODE and no
              Jacobian (I wonder, why the documentation suggests
              otherwise, though). I had the following four tries at
              implementing the RHS:</p>
            <ol>
              <li><font face="monospace">def rhsfunc1(ts, t, u, F):<br>
                      scale = 0.5 * (5 < t < 10)<br>
                      (l + scale * pump).mult(u, F)</font></li>
              <li><font face="monospace">def rhsfunc2(ts, t, u, F):<br>
                      l.mult(u, F)<br>
                      scale = 0.5 * (5 < t < 10)<br>
                      (scale * pump).multAdd(u, F, F)</font></li>
              <li><font face="monospace">def rhsfunc3(ts, t, u, F):<br>
                      l.mult(u, F)<br>
                      scale = 0.5 * (5 < t < 10)<br>
                      if scale != 0:<br>
                          pump.scale(scale)<br>
                          pump.multAdd(u, F, F)<br>
                          pump.scale(1/scale)</font></li>
              <li><font face="monospace">def rhsfunc4(ts, t, u, F):<br>
                      tmp_pump.zeroEntries() # tmp_pump is
                  pump.duplicate()<br>
                      l.mult(u, F)<br>
                      scale = 0.5 * (5 < t < 10)<br>
                      tmp_pump.axpy(scale, pump,
                  structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)<br>
                      tmp_pump.multAdd(u, F, F)<br>
                </font></li>
            </ol>
            <p>They all yield the same results, but with 50it/s, 800it/,
              2300it/s and 1900it/s, respectively, which is a huge
              performance boost (almost 7 times as fast as scipy, with
              PETSc debugging still turned on). As the scale function
              will most likely be a gaussian in the future, I think that
              option 3 will be become numerically unstable and I'll have
              to go with option 4, which is already faster than I
              expected. If you think it is possible to speed up the RHS
              calculation even more, I'd be happy to hear your
              suggestions; the -log_view is attached to this message.</p>
            <p>One last point: If I didn't misunderstand the
              documentation at <a href="https://petsc.org/release/manual/ts/#special-cases" rel="noreferrer noreferrer" target="_blank">https://petsc.org/release/manual/ts/#special-cases</a>,
              should this maybe be changed?</p>
            <p>Best regards<br>
              Niclas<br>
            </p>
            <div>On 09.08.23 17:51, Stefano Zampini wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="auto">
                <div>TSRK is an explicit solver. Unless you are changing
                  the ts type from command line,  the explicit  jacobian
                  should not be needed. On top of Barry's suggestion, I
                  would suggest you to write the explicit RHS instead of
                  assembly a throw away matrix every time that function
                  needs to be sampled.<br>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Wed, Aug 9,
                      2023, 17:09 Niclas Götting <<a href="mailto:ngoetting@itp.uni-bremen.de" rel="noreferrer noreferrer" target="_blank">ngoetting@itp.uni-bremen.de</a>>
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi
                      all,<br>
                      <br>
                      I'm currently trying to convert a quantum
                      simulation from scipy to <br>
                      PETSc. The problem itself is extremely simple and
                      of the form \dot{u}(t) <br>
                      = (A_const + f(t)*B_const)*u(t), where f(t) in
                      this simple test case is <br>
                      a square function. The matrices A_const and
                      B_const are extremely sparse <br>
                      and therefore I thought, the problem will be well
                      suited for PETSc. <br>
                      Currently, I solve the ODE with the following
                      procedure in scipy (I can <br>
                      provide the necessary data files, if needed, but
                      they are just some <br>
                      trace-preserving, very sparse matrices):<br>
                      <br>
                      import numpy as np<br>
                      import scipy.sparse<br>
                      import scipy.integrate<br>
                      <br>
                      from tqdm import tqdm<br>
                      <br>
                      <br>
                      l = np.load("../liouvillian.npy")<br>
                      pump = np.load("../pump_operator.npy")<br>
                      state = np.load("../initial_state.npy")<br>
                      <br>
                      l = scipy.sparse.csr_array(l)<br>
                      pump = scipy.sparse.csr_array(pump)<br>
                      <br>
                      def f(t, y, *args):<br>
                           return (l + 0.5 * (5 < t < 10) * pump)
                      @ y<br>
                           #return l @ y # Uncomment for f(t) = 0<br>
                      <br>
                      dt = 0.1<br>
                      NUM_STEPS = 200<br>
                      res = np.empty((NUM_STEPS, 4096),
                      dtype=np.complex128)<br>
                      solver = <br>
scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)<br>
                      times = []<br>
                      for i in tqdm(range(NUM_STEPS)):<br>
                           res[i, :] = solver.integrate(solver.t + dt)<br>
                           times.append(solver.t)<br>
                      <br>
                      Here, A_const = l, B_const = pump and f(t) = 5
                      < t < 10. tqdm reports <br>
                      about 330it/s on my machine. When converting the
                      code to PETSc, I came <br>
                      to the following result (according to the chapter
                      <br>
                      <a href="https://petsc.org/main/manual/ts/#special-cases" rel="noreferrer noreferrer noreferrer noreferrer" target="_blank">https://petsc.org/main/manual/ts/#special-cases</a>)<br>
                      <br>
                      import sys<br>
                      import petsc4py<br>
                      petsc4py.init(args=sys.argv)<br>
                      import numpy as np<br>
                      import scipy.sparse<br>
                      <br>
                      from tqdm import tqdm<br>
                      from petsc4py import PETSc<br>
                      <br>
                      comm = PETSc.COMM_WORLD<br>
                      <br>
                      <br>
                      def mat_to_real(arr):<br>
                           return np.block([[arr.real, -arr.imag],
                      [arr.imag, <br>
                      arr.real]]).astype(np.float64)<br>
                      <br>
                      def mat_to_petsc_aij(arr):<br>
                           arr_sc_sp = scipy.sparse.csr_array(arr)<br>
                           mat = PETSc.Mat().createAIJ(arr.shape[0],
                      comm=comm)<br>
                           rstart, rend = mat.getOwnershipRange()<br>
                           print(rstart, rend)<br>
                           print(arr.shape[0])<br>
                           print(mat.sizes)<br>
                           I = arr_sc_sp.indptr[rstart : rend + 1] -
                      arr_sc_sp.indptr[rstart]<br>
                           J =
                      arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : <br>
                      arr_sc_sp.indptr[rend]]<br>
                           V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] :
                      arr_sc_sp.indptr[rend]]<br>
                      <br>
                           print(I.shape, J.shape, V.shape)<br>
                           mat.setValuesCSR(I, J, V)<br>
                           mat.assemble()<br>
                           return mat<br>
                      <br>
                      <br>
                      l = np.load("../liouvillian.npy")<br>
                      l = mat_to_real(l)<br>
                      pump = np.load("../pump_operator.npy")<br>
                      pump = mat_to_real(pump)<br>
                      state = np.load("../initial_state.npy")<br>
                      state = np.hstack([state.real,
                      state.imag]).astype(np.float64)<br>
                      <br>
                      l = mat_to_petsc_aij(l)<br>
                      pump = mat_to_petsc_aij(pump)<br>
                      <br>
                      <br>
                      jac = l.duplicate()<br>
                      for i in range(8192):<br>
                           jac.setValue(i, i, 0)<br>
                      jac.assemble()<br>
                      jac += l<br>
                      <br>
                      vec = l.createVecRight()<br>
                      vec.setValues(np.arange(state.shape[0],
                      dtype=np.int32), state)<br>
                      vec.assemble()<br>
                      <br>
                      <br>
                      dt = 0.1<br>
                      <br>
                      ts = PETSc.TS().create(comm=comm)<br>
                      ts.setFromOptions()<br>
                      ts.setProblemType(ts.ProblemType.LINEAR)<br>
                      ts.setEquationType(ts.EquationType.ODE_EXPLICIT)<br>
                      ts.setType(ts.Type.RK)<br>
                      ts.setRKType(ts.RKType.RK3BS)<br>
                      ts.setTime(0)<br>
                      print("KSP:", ts.getKSP().getType())<br>
                      print("KSP PC:",ts.getKSP().getPC().getType())<br>
                      print("SNES :", ts.getSNES().getType())<br>
                      <br>
                      def jacobian(ts, t, u, Amat, Pmat):<br>
                           Amat.zeroEntries()<br>
                           Amat.aypx(1, l,
                      structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>
                           Amat.axpy(0.5 * (5 < t < 10), pump, <br>
structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>
                      <br>
ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)<br>
#ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # <br>
                      Uncomment for f(t) = 0<br>
                      ts.setRHSJacobian(jacobian, jac)<br>
                      <br>
                      NUM_STEPS = 200<br>
                      res = np.empty((NUM_STEPS, 8192),
                      dtype=np.float64)<br>
                      times = []<br>
                      rstart, rend = vec.getOwnershipRange()<br>
                      for i in tqdm(range(NUM_STEPS)):<br>
                           time = ts.getTime()<br>
                           ts.setMaxTime(time + dt)<br>
                           ts.solve(vec)<br>
                           res[i, rstart:rend] = vec.getArray()[:]<br>
                           times.append(time)<br>
                      <br>
                      I decomposed the complex ODE into a larger real
                      ODE, so that I can <br>
                      easily switch maybe to GPU computation later on.
                      Now, the solutions of <br>
                      both scripts are very much identical, but PETSc
                      runs about 3 times <br>
                      slower at 120it/s on my machine. I don't use MPI
                      for PETSc yet.<br>
                      <br>
                      I strongly suppose that the problem lies within
                      the jacobian definition, <br>
                      as PETSc is about 3 times *faster* than scipy with
                      f(t) = 0 and <br>
                      therefore a constant jacobian.<br>
                      <br>
                      Thank you in advance.<br>
                      <br>
                      All the best,<br>
                      Niclas<br>
                      <br>
                      <br>
                    </blockquote>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </div>

</blockquote></div>