<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Alright. Again, thank you very much for taking the time to answer
      my beginner questions! Still a lot to learn..</p>
    <p>Have a good day!<br>
    </p>
    <div class="moz-cite-prefix">On 10.08.23 12:27, Stefano Zampini
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAGPUisj0LYfALBULQziWsdAHj6ciD7YtODC-V5O+xtfB1pi7Lg@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="auto">Then just do the multiplications you need. My
        proposal was for the example function you were showing.</div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023, 12:25
          Niclas Götting <<a
            href="mailto:ngoetting@itp.uni-bremen.de"
            moz-do-not-send="true" class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div>
            <p>You are absolutely right for this specific case (I get
              about 2400it/s instead of 2100it/s). However, the single
              square function will be replaced by a series of gaussian
              pulses in the future, which will never be zero. Maybe one
              could do an approximation and skip the second mult, if the
              gaussians are close to zero.</p>
            <div>On 10.08.23 12:16, Stefano Zampini wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="auto">If you do the mult of "pump" inside an if
                it should be faster </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023,
                  12:12 Niclas Götting <<a
                    href="mailto:ngoetting@itp.uni-bremen.de"
                    target="_blank" rel="noreferrer"
                    moz-do-not-send="true" class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div>
                    <p>If I understood you right, this should be the
                      resulting RHS:</p>
                    <p><font face="monospace">def rhsfunc5(ts, t, u, F):<br>
                            l.mult(u, F)<br>
                            pump.mult(u, tmp_vec)<br>
                            scale = 0.5 * (5 < t < 10)<br>
                            F.axpy(scale, tmp_vec)</font></p>
                    <p>It is a little bit slower than option 3, but with
                      about 2100it/s consistently ~10% faster than
                      option 4.</p>
                    <p>Thank you very much for the suggestion!<br>
                    </p>
                    <div>On 10.08.23 11:47, Stefano Zampini wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div dir="auto">I would use option 3. Keep a work
                        vector and do a vector summation instead of the
                        multiple multiplication by scale and 1/scale. 
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">I agree with you the docs are a
                          little misleading here. </div>
                      </div>
                      <br>
                      <div class="gmail_quote">
                        <div dir="ltr" class="gmail_attr">On Thu, Aug
                          10, 2023, 11:40 Niclas Götting <<a
                            href="mailto:ngoetting@itp.uni-bremen.de"
                            rel="noreferrer noreferrer" target="_blank"
                            moz-do-not-send="true"
                            class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>
                          wrote:<br>
                        </div>
                        <blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                          <div>
                            <p>Thank you both for the very quick answer!</p>
                            <p>So far, I compiled PETSc with debugging
                              turned on, but I think it should still be
                              faster than standard scipy in both cases.
                              Actually, Stefano's answer has got me very
                              far already; now I only define the RHS of
                              the ODE and no Jacobian (I wonder, why the
                              documentation suggests otherwise, though).
                              I had the following four tries at
                              implementing the RHS:</p>
                            <ol>
                              <li><font face="monospace">def
                                  rhsfunc1(ts, t, u, F):<br>
                                      scale = 0.5 * (5 < t < 10)<br>
                                      (l + scale * pump).mult(u, F)</font></li>
                              <li><font face="monospace">def
                                  rhsfunc2(ts, t, u, F):<br>
                                      l.mult(u, F)<br>
                                      scale = 0.5 * (5 < t < 10)<br>
                                      (scale * pump).multAdd(u, F, F)</font></li>
                              <li><font face="monospace">def
                                  rhsfunc3(ts, t, u, F):<br>
                                      l.mult(u, F)<br>
                                      scale = 0.5 * (5 < t < 10)<br>
                                      if scale != 0:<br>
                                          pump.scale(scale)<br>
                                          pump.multAdd(u, F, F)<br>
                                          pump.scale(1/scale)</font></li>
                              <li><font face="monospace">def
                                  rhsfunc4(ts, t, u, F):<br>
                                      tmp_pump.zeroEntries() # tmp_pump
                                  is pump.duplicate()<br>
                                      l.mult(u, F)<br>
                                      scale = 0.5 * (5 < t < 10)<br>
                                      tmp_pump.axpy(scale, pump,
                                  structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)<br>
                                      tmp_pump.multAdd(u, F, F)<br>
                                </font></li>
                            </ol>
                            <p>They all yield the same results, but with
                              50it/s, 800it/, 2300it/s and 1900it/s,
                              respectively, which is a huge performance
                              boost (almost 7 times as fast as scipy,
                              with PETSc debugging still turned on). As
                              the scale function will most likely be a
                              gaussian in the future, I think that
                              option 3 will be become numerically
                              unstable and I'll have to go with option
                              4, which is already faster than I
                              expected. If you think it is possible to
                              speed up the RHS calculation even more,
                              I'd be happy to hear your suggestions; the
                              -log_view is attached to this message.</p>
                            <p>One last point: If I didn't misunderstand
                              the documentation at <a
href="https://petsc.org/release/manual/ts/#special-cases"
                                rel="noreferrer noreferrer noreferrer"
                                target="_blank" moz-do-not-send="true"
                                class="moz-txt-link-freetext">https://petsc.org/release/manual/ts/#special-cases</a>,
                              should this maybe be changed?</p>
                            <p>Best regards<br>
                              Niclas<br>
                            </p>
                            <div>On 09.08.23 17:51, Stefano Zampini
                              wrote:<br>
                            </div>
                            <blockquote type="cite">
                              <div dir="auto">
                                <div>TSRK is an explicit solver. Unless
                                  you are changing the ts type from
                                  command line,  the explicit  jacobian
                                  should not be needed. On top of
                                  Barry's suggestion, I would suggest
                                  you to write the explicit RHS instead
                                  of assembly a throw away matrix every
                                  time that function needs to be
                                  sampled.<br>
                                  <br>
                                  <div class="gmail_quote">
                                    <div dir="ltr" class="gmail_attr">On
                                      Wed, Aug 9, 2023, 17:09 Niclas
                                      Götting <<a
href="mailto:ngoetting@itp.uni-bremen.de"
rel="noreferrer noreferrer noreferrer" target="_blank"
                                        moz-do-not-send="true"
                                        class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>
                                      wrote:<br>
                                    </div>
                                    <blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi
                                      all,<br>
                                      <br>
                                      I'm currently trying to convert a
                                      quantum simulation from scipy to <br>
                                      PETSc. The problem itself is
                                      extremely simple and of the form
                                      \dot{u}(t) <br>
                                      = (A_const + f(t)*B_const)*u(t),
                                      where f(t) in this simple test
                                      case is <br>
                                      a square function. The matrices
                                      A_const and B_const are extremely
                                      sparse <br>
                                      and therefore I thought, the
                                      problem will be well suited for
                                      PETSc. <br>
                                      Currently, I solve the ODE with
                                      the following procedure in scipy
                                      (I can <br>
                                      provide the necessary data files,
                                      if needed, but they are just some
                                      <br>
                                      trace-preserving, very sparse
                                      matrices):<br>
                                      <br>
                                      import numpy as np<br>
                                      import scipy.sparse<br>
                                      import scipy.integrate<br>
                                      <br>
                                      from tqdm import tqdm<br>
                                      <br>
                                      <br>
                                      l = np.load("../liouvillian.npy")<br>
                                      pump =
                                      np.load("../pump_operator.npy")<br>
                                      state =
                                      np.load("../initial_state.npy")<br>
                                      <br>
                                      l = scipy.sparse.csr_array(l)<br>
                                      pump =
                                      scipy.sparse.csr_array(pump)<br>
                                      <br>
                                      def f(t, y, *args):<br>
                                           return (l + 0.5 * (5 < t
                                      < 10) * pump) @ y<br>
                                           #return l @ y # Uncomment for
                                      f(t) = 0<br>
                                      <br>
                                      dt = 0.1<br>
                                      NUM_STEPS = 200<br>
                                      res = np.empty((NUM_STEPS, 4096),
                                      dtype=np.complex128)<br>
                                      solver = <br>
scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)<br>
                                      times = []<br>
                                      for i in tqdm(range(NUM_STEPS)):<br>
                                           res[i, :] =
                                      solver.integrate(solver.t + dt)<br>
                                           times.append(solver.t)<br>
                                      <br>
                                      Here, A_const = l, B_const = pump
                                      and f(t) = 5 < t < 10. tqdm
                                      reports <br>
                                      about 330it/s on my machine. When
                                      converting the code to PETSc, I
                                      came <br>
                                      to the following result (according
                                      to the chapter <br>
                                      <a
href="https://petsc.org/main/manual/ts/#special-cases"
rel="noreferrer noreferrer noreferrer noreferrer noreferrer"
                                        target="_blank"
                                        moz-do-not-send="true"
                                        class="moz-txt-link-freetext">https://petsc.org/main/manual/ts/#special-cases</a>)<br>
                                      <br>
                                      import sys<br>
                                      import petsc4py<br>
                                      petsc4py.init(args=sys.argv)<br>
                                      import numpy as np<br>
                                      import scipy.sparse<br>
                                      <br>
                                      from tqdm import tqdm<br>
                                      from petsc4py import PETSc<br>
                                      <br>
                                      comm = PETSc.COMM_WORLD<br>
                                      <br>
                                      <br>
                                      def mat_to_real(arr):<br>
                                           return np.block([[arr.real,
                                      -arr.imag], [arr.imag, <br>
                                      arr.real]]).astype(np.float64)<br>
                                      <br>
                                      def mat_to_petsc_aij(arr):<br>
                                           arr_sc_sp =
                                      scipy.sparse.csr_array(arr)<br>
                                           mat =
                                      PETSc.Mat().createAIJ(arr.shape[0],
                                      comm=comm)<br>
                                           rstart, rend =
                                      mat.getOwnershipRange()<br>
                                           print(rstart, rend)<br>
                                           print(arr.shape[0])<br>
                                           print(mat.sizes)<br>
                                           I = arr_sc_sp.indptr[rstart :
                                      rend + 1] -
                                      arr_sc_sp.indptr[rstart]<br>
                                           J =
                                      arr_sc_sp.indices[arr_sc_sp.indptr[rstart]
                                      : <br>
                                      arr_sc_sp.indptr[rend]]<br>
                                           V =
                                      arr_sc_sp.data[arr_sc_sp.indptr[rstart]
                                      : arr_sc_sp.indptr[rend]]<br>
                                      <br>
                                           print(I.shape, J.shape,
                                      V.shape)<br>
                                           mat.setValuesCSR(I, J, V)<br>
                                           mat.assemble()<br>
                                           return mat<br>
                                      <br>
                                      <br>
                                      l = np.load("../liouvillian.npy")<br>
                                      l = mat_to_real(l)<br>
                                      pump =
                                      np.load("../pump_operator.npy")<br>
                                      pump = mat_to_real(pump)<br>
                                      state =
                                      np.load("../initial_state.npy")<br>
                                      state = np.hstack([state.real,
                                      state.imag]).astype(np.float64)<br>
                                      <br>
                                      l = mat_to_petsc_aij(l)<br>
                                      pump = mat_to_petsc_aij(pump)<br>
                                      <br>
                                      <br>
                                      jac = l.duplicate()<br>
                                      for i in range(8192):<br>
                                           jac.setValue(i, i, 0)<br>
                                      jac.assemble()<br>
                                      jac += l<br>
                                      <br>
                                      vec = l.createVecRight()<br>
vec.setValues(np.arange(state.shape[0], dtype=np.int32), state)<br>
                                      vec.assemble()<br>
                                      <br>
                                      <br>
                                      dt = 0.1<br>
                                      <br>
                                      ts = PETSc.TS().create(comm=comm)<br>
                                      ts.setFromOptions()<br>
ts.setProblemType(ts.ProblemType.LINEAR)<br>
ts.setEquationType(ts.EquationType.ODE_EXPLICIT)<br>
                                      ts.setType(ts.Type.RK)<br>
                                      ts.setRKType(ts.RKType.RK3BS)<br>
                                      ts.setTime(0)<br>
                                      print("KSP:",
                                      ts.getKSP().getType())<br>
                                      print("KSP
                                      PC:",ts.getKSP().getPC().getType())<br>
                                      print("SNES :",
                                      ts.getSNES().getType())<br>
                                      <br>
                                      def jacobian(ts, t, u, Amat,
                                      Pmat):<br>
                                           Amat.zeroEntries()<br>
                                           Amat.aypx(1, l,
                                      structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>
                                           Amat.axpy(0.5 * (5 < t
                                      < 10), pump, <br>
structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>
                                      <br>
ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)<br>
#ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # <br>
                                      Uncomment for f(t) = 0<br>
                                      ts.setRHSJacobian(jacobian, jac)<br>
                                      <br>
                                      NUM_STEPS = 200<br>
                                      res = np.empty((NUM_STEPS, 8192),
                                      dtype=np.float64)<br>
                                      times = []<br>
                                      rstart, rend =
                                      vec.getOwnershipRange()<br>
                                      for i in tqdm(range(NUM_STEPS)):<br>
                                           time = ts.getTime()<br>
                                           ts.setMaxTime(time + dt)<br>
                                           ts.solve(vec)<br>
                                           res[i, rstart:rend] =
                                      vec.getArray()[:]<br>
                                           times.append(time)<br>
                                      <br>
                                      I decomposed the complex ODE into
                                      a larger real ODE, so that I can <br>
                                      easily switch maybe to GPU
                                      computation later on. Now, the
                                      solutions of <br>
                                      both scripts are very much
                                      identical, but PETSc runs about 3
                                      times <br>
                                      slower at 120it/s on my machine. I
                                      don't use MPI for PETSc yet.<br>
                                      <br>
                                      I strongly suppose that the
                                      problem lies within the jacobian
                                      definition, <br>
                                      as PETSc is about 3 times *faster*
                                      than scipy with f(t) = 0 and <br>
                                      therefore a constant jacobian.<br>
                                      <br>
                                      Thank you in advance.<br>
                                      <br>
                                      All the best,<br>
                                      Niclas<br>
                                      <br>
                                      <br>
                                    </blockquote>
                                  </div>
                                </div>
                              </div>
                            </blockquote>
                          </div>
                        </blockquote>
                      </div>
                    </blockquote>
                  </div>
                </blockquote>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </body>
</html>