<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Alright. Again, thank you very much for taking the time to answer

      my beginner questions! Still a lot to learn..</p>

    <p>Have a good day!<br>

    </p>

    <div class="moz-cite-prefix">On 10.08.23 12:27, Stefano Zampini

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAGPUisj0LYfALBULQziWsdAHj6ciD7YtODC-V5O+xtfB1pi7Lg@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="auto">Then just do the multiplications you need. My

        proposal was for the example function you were showing.</div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023, 12:25

          Niclas Götting <<a

            href="mailto:ngoetting@itp.uni-bremen.de"

            moz-do-not-send="true" class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote"

style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div>

            <p>You are absolutely right for this specific case (I get

              about 2400it/s instead of 2100it/s). However, the single

              square function will be replaced by a series of gaussian

              pulses in the future, which will never be zero. Maybe one

              could do an approximation and skip the second mult, if the

              gaussians are close to zero.</p>

            <div>On 10.08.23 12:16, Stefano Zampini wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="auto">If you do the mult of "pump" inside an if

                it should be faster </div>

              <br>

              <div class="gmail_quote">

                <div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023,

                  12:12 Niclas Götting <<a

                    href="mailto:ngoetting@itp.uni-bremen.de"

                    target="_blank" rel="noreferrer"

                    moz-do-not-send="true" class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>

                  wrote:<br>

                </div>

                <blockquote class="gmail_quote"

style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <div>

                    <p>If I understood you right, this should be the

                      resulting RHS:</p>

                    <p><font face="monospace">def rhsfunc5(ts, t, u, F):<br>

                            l.mult(u, F)<br>

                            pump.mult(u, tmp_vec)<br>

                            scale = 0.5 * (5 < t < 10)<br>

                            F.axpy(scale, tmp_vec)</font></p>

                    <p>It is a little bit slower than option 3, but with

                      about 2100it/s consistently ~10% faster than

                      option 4.</p>

                    <p>Thank you very much for the suggestion!<br>

                    </p>

                    <div>On 10.08.23 11:47, Stefano Zampini wrote:<br>

                    </div>

                    <blockquote type="cite">

                      <div dir="auto">I would use option 3. Keep a work

                        vector and do a vector summation instead of the

                        multiple multiplication by scale and 1/scale. 

                        <div dir="auto"><br>

                        </div>

                        <div dir="auto">I agree with you the docs are a

                          little misleading here. </div>

                      </div>

                      <br>

                      <div class="gmail_quote">

                        <div dir="ltr" class="gmail_attr">On Thu, Aug

                          10, 2023, 11:40 Niclas Götting <<a

                            href="mailto:ngoetting@itp.uni-bremen.de"

                            rel="noreferrer noreferrer" target="_blank"

                            moz-do-not-send="true"

                            class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>

                          wrote:<br>

                        </div>

                        <blockquote class="gmail_quote"

style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                          <div>

                            <p>Thank you both for the very quick answer!</p>

                            <p>So far, I compiled PETSc with debugging

                              turned on, but I think it should still be

                              faster than standard scipy in both cases.

                              Actually, Stefano's answer has got me very

                              far already; now I only define the RHS of

                              the ODE and no Jacobian (I wonder, why the

                              documentation suggests otherwise, though).

                              I had the following four tries at

                              implementing the RHS:</p>

                            <ol>

                              <li><font face="monospace">def

                                  rhsfunc1(ts, t, u, F):<br>

                                      scale = 0.5 * (5 < t < 10)<br>

                                      (l + scale * pump).mult(u, F)</font></li>

                              <li><font face="monospace">def

                                  rhsfunc2(ts, t, u, F):<br>

                                      l.mult(u, F)<br>

                                      scale = 0.5 * (5 < t < 10)<br>

                                      (scale * pump).multAdd(u, F, F)</font></li>

                              <li><font face="monospace">def

                                  rhsfunc3(ts, t, u, F):<br>

                                      l.mult(u, F)<br>

                                      scale = 0.5 * (5 < t < 10)<br>

                                      if scale != 0:<br>

                                          pump.scale(scale)<br>

                                          pump.multAdd(u, F, F)<br>

                                          pump.scale(1/scale)</font></li>

                              <li><font face="monospace">def

                                  rhsfunc4(ts, t, u, F):<br>

                                      tmp_pump.zeroEntries() # tmp_pump

                                  is pump.duplicate()<br>

                                      l.mult(u, F)<br>

                                      scale = 0.5 * (5 < t < 10)<br>

                                      tmp_pump.axpy(scale, pump,

                                  structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)<br>

                                      tmp_pump.multAdd(u, F, F)<br>

                                </font></li>

                            </ol>

                            <p>They all yield the same results, but with

                              50it/s, 800it/, 2300it/s and 1900it/s,

                              respectively, which is a huge performance

                              boost (almost 7 times as fast as scipy,

                              with PETSc debugging still turned on). As

                              the scale function will most likely be a

                              gaussian in the future, I think that

                              option 3 will be become numerically

                              unstable and I'll have to go with option

                              4, which is already faster than I

                              expected. If you think it is possible to

                              speed up the RHS calculation even more,

                              I'd be happy to hear your suggestions; the

                              -log_view is attached to this message.</p>

                            <p>One last point: If I didn't misunderstand

                              the documentation at <a

href="https://petsc.org/release/manual/ts/#special-cases"

                                rel="noreferrer noreferrer noreferrer"

                                target="_blank" moz-do-not-send="true"

                                class="moz-txt-link-freetext">https://petsc.org/release/manual/ts/#special-cases</a>,

                              should this maybe be changed?</p>

                            <p>Best regards<br>

                              Niclas<br>

                            </p>

                            <div>On 09.08.23 17:51, Stefano Zampini

                              wrote:<br>

                            </div>

                            <blockquote type="cite">

                              <div dir="auto">

                                <div>TSRK is an explicit solver. Unless

                                  you are changing the ts type from

                                  command line,  the explicit  jacobian

                                  should not be needed. On top of

                                  Barry's suggestion, I would suggest

                                  you to write the explicit RHS instead

                                  of assembly a throw away matrix every

                                  time that function needs to be

                                  sampled.<br>

                                  <br>

                                  <div class="gmail_quote">

                                    <div dir="ltr" class="gmail_attr">On

                                      Wed, Aug 9, 2023, 17:09 Niclas

                                      Götting <<a

href="mailto:ngoetting@itp.uni-bremen.de"

rel="noreferrer noreferrer noreferrer" target="_blank"

                                        moz-do-not-send="true"

                                        class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>

                                      wrote:<br>

                                    </div>

                                    <blockquote class="gmail_quote"

style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi

                                      all,<br>

                                      <br>

                                      I'm currently trying to convert a

                                      quantum simulation from scipy to <br>

                                      PETSc. The problem itself is

                                      extremely simple and of the form

                                      \dot{u}(t) <br>

                                      = (A_const + f(t)*B_const)*u(t),

                                      where f(t) in this simple test

                                      case is <br>

                                      a square function. The matrices

                                      A_const and B_const are extremely

                                      sparse <br>

                                      and therefore I thought, the

                                      problem will be well suited for

                                      PETSc. <br>

                                      Currently, I solve the ODE with

                                      the following procedure in scipy

                                      (I can <br>

                                      provide the necessary data files,

                                      if needed, but they are just some

                                      <br>

                                      trace-preserving, very sparse

                                      matrices):<br>

                                      <br>

                                      import numpy as np<br>

                                      import scipy.sparse<br>

                                      import scipy.integrate<br>

                                      <br>

                                      from tqdm import tqdm<br>

                                      <br>

                                      <br>

                                      l = np.load("../liouvillian.npy")<br>

                                      pump =

                                      np.load("../pump_operator.npy")<br>

                                      state =

                                      np.load("../initial_state.npy")<br>

                                      <br>

                                      l = scipy.sparse.csr_array(l)<br>

                                      pump =

                                      scipy.sparse.csr_array(pump)<br>

                                      <br>

                                      def f(t, y, *args):<br>

                                           return (l + 0.5 * (5 < t

                                      < 10) * pump) @ y<br>

                                           #return l @ y # Uncomment for

                                      f(t) = 0<br>

                                      <br>

                                      dt = 0.1<br>

                                      NUM_STEPS = 200<br>

                                      res = np.empty((NUM_STEPS, 4096),

                                      dtype=np.complex128)<br>

                                      solver = <br>

scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)<br>

                                      times = []<br>

                                      for i in tqdm(range(NUM_STEPS)):<br>

                                           res[i, :] =

                                      solver.integrate(solver.t + dt)<br>

                                           times.append(solver.t)<br>

                                      <br>

                                      Here, A_const = l, B_const = pump

                                      and f(t) = 5 < t < 10. tqdm

                                      reports <br>

                                      about 330it/s on my machine. When

                                      converting the code to PETSc, I

                                      came <br>

                                      to the following result (according

                                      to the chapter <br>

                                      <a

href="https://petsc.org/main/manual/ts/#special-cases"

rel="noreferrer noreferrer noreferrer noreferrer noreferrer"

                                        target="_blank"

                                        moz-do-not-send="true"

                                        class="moz-txt-link-freetext">https://petsc.org/main/manual/ts/#special-cases</a>)<br>

                                      <br>

                                      import sys<br>

                                      import petsc4py<br>

                                      petsc4py.init(args=sys.argv)<br>

                                      import numpy as np<br>

                                      import scipy.sparse<br>

                                      <br>

                                      from tqdm import tqdm<br>

                                      from petsc4py import PETSc<br>

                                      <br>

                                      comm = PETSc.COMM_WORLD<br>

                                      <br>

                                      <br>

                                      def mat_to_real(arr):<br>

                                           return np.block([[arr.real,

                                      -arr.imag], [arr.imag, <br>

                                      arr.real]]).astype(np.float64)<br>

                                      <br>

                                      def mat_to_petsc_aij(arr):<br>

                                           arr_sc_sp =

                                      scipy.sparse.csr_array(arr)<br>

                                           mat =

                                      PETSc.Mat().createAIJ(arr.shape[0],

                                      comm=comm)<br>

                                           rstart, rend =

                                      mat.getOwnershipRange()<br>

                                           print(rstart, rend)<br>

                                           print(arr.shape[0])<br>

                                           print(mat.sizes)<br>

                                           I = arr_sc_sp.indptr[rstart :

                                      rend + 1] -

                                      arr_sc_sp.indptr[rstart]<br>

                                           J =

                                      arr_sc_sp.indices[arr_sc_sp.indptr[rstart]

                                      : <br>

                                      arr_sc_sp.indptr[rend]]<br>

                                           V =

                                      arr_sc_sp.data[arr_sc_sp.indptr[rstart]

                                      : arr_sc_sp.indptr[rend]]<br>

                                      <br>

                                           print(I.shape, J.shape,

                                      V.shape)<br>

                                           mat.setValuesCSR(I, J, V)<br>

                                           mat.assemble()<br>

                                           return mat<br>

                                      <br>

                                      <br>

                                      l = np.load("../liouvillian.npy")<br>

                                      l = mat_to_real(l)<br>

                                      pump =

                                      np.load("../pump_operator.npy")<br>

                                      pump = mat_to_real(pump)<br>

                                      state =

                                      np.load("../initial_state.npy")<br>

                                      state = np.hstack([state.real,

                                      state.imag]).astype(np.float64)<br>

                                      <br>

                                      l = mat_to_petsc_aij(l)<br>

                                      pump = mat_to_petsc_aij(pump)<br>

                                      <br>

                                      <br>

                                      jac = l.duplicate()<br>

                                      for i in range(8192):<br>

                                           jac.setValue(i, i, 0)<br>

                                      jac.assemble()<br>

                                      jac += l<br>

                                      <br>

                                      vec = l.createVecRight()<br>

vec.setValues(np.arange(state.shape[0], dtype=np.int32), state)<br>

                                      vec.assemble()<br>

                                      <br>

                                      <br>

                                      dt = 0.1<br>

                                      <br>

                                      ts = PETSc.TS().create(comm=comm)<br>

                                      ts.setFromOptions()<br>

ts.setProblemType(ts.ProblemType.LINEAR)<br>

ts.setEquationType(ts.EquationType.ODE_EXPLICIT)<br>

                                      ts.setType(ts.Type.RK)<br>

                                      ts.setRKType(ts.RKType.RK3BS)<br>

                                      ts.setTime(0)<br>

                                      print("KSP:",

                                      ts.getKSP().getType())<br>

                                      print("KSP

                                      PC:",ts.getKSP().getPC().getType())<br>

                                      print("SNES :",

                                      ts.getSNES().getType())<br>

                                      <br>

                                      def jacobian(ts, t, u, Amat,

                                      Pmat):<br>

                                           Amat.zeroEntries()<br>

                                           Amat.aypx(1, l,

                                      structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>

                                           Amat.axpy(0.5 * (5 < t

                                      < 10), pump, <br>

structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>

                                      <br>

ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)<br>

#ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # <br>

                                      Uncomment for f(t) = 0<br>

                                      ts.setRHSJacobian(jacobian, jac)<br>

                                      <br>

                                      NUM_STEPS = 200<br>

                                      res = np.empty((NUM_STEPS, 8192),

                                      dtype=np.float64)<br>

                                      times = []<br>

                                      rstart, rend =

                                      vec.getOwnershipRange()<br>

                                      for i in tqdm(range(NUM_STEPS)):<br>

                                           time = ts.getTime()<br>

                                           ts.setMaxTime(time + dt)<br>

                                           ts.solve(vec)<br>

                                           res[i, rstart:rend] =

                                      vec.getArray()[:]<br>

                                           times.append(time)<br>

                                      <br>

                                      I decomposed the complex ODE into

                                      a larger real ODE, so that I can <br>

                                      easily switch maybe to GPU

                                      computation later on. Now, the

                                      solutions of <br>

                                      both scripts are very much

                                      identical, but PETSc runs about 3

                                      times <br>

                                      slower at 120it/s on my machine. I

                                      don't use MPI for PETSc yet.<br>

                                      <br>

                                      I strongly suppose that the

                                      problem lies within the jacobian

                                      definition, <br>

                                      as PETSc is about 3 times *faster*

                                      than scipy with f(t) = 0 and <br>

                                      therefore a constant jacobian.<br>

                                      <br>

                                      Thank you in advance.<br>

                                      <br>

                                      All the best,<br>

                                      Niclas<br>

                                      <br>

                                      <br>

                                    </blockquote>

                                  </div>

                                </div>

                              </div>

                            </blockquote>

                          </div>

                        </blockquote>

                      </div>

                    </blockquote>

                  </div>

                </blockquote>

              </div>

            </blockquote>

          </div>

        </blockquote>

      </div>

    </blockquote>

  </body>

</html>