<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>You are absolutely right for this specific case (I get about

      2400it/s instead of 2100it/s). However, the single square function

      will be replaced by a series of gaussian pulses in the future,

      which will never be zero. Maybe one could do an approximation and

      skip the second mult, if the gaussians are close to zero.</p>

    <div class="moz-cite-prefix">On 10.08.23 12:16, Stefano Zampini

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAGPUisgCtEVZ_h1JPUMDMcMpJ8z3U65Veo_PYGkWFDbMZAQ_0A@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="auto">If you do the mult of "pump" inside an if it

        should be faster </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023, 12:12

          Niclas Götting <<a

            href="mailto:ngoetting@itp.uni-bremen.de"

            moz-do-not-send="true" class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote"

style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div>

            <p>If I understood you right, this should be the resulting

              RHS:</p>

            <p><font face="monospace">def rhsfunc5(ts, t, u, F):<br>

                    l.mult(u, F)<br>

                    pump.mult(u, tmp_vec)<br>

                    scale = 0.5 * (5 < t < 10)<br>

                    F.axpy(scale, tmp_vec)</font></p>

            <p>It is a little bit slower than option 3, but with about

              2100it/s consistently ~10% faster than option 4.</p>

            <p>Thank you very much for the suggestion!<br>

            </p>

            <div>On 10.08.23 11:47, Stefano Zampini wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="auto">I would use option 3. Keep a work vector

                and do a vector summation instead of the multiple

                multiplication by scale and 1/scale. 

                <div dir="auto"><br>

                </div>

                <div dir="auto">I agree with you the docs are a little

                  misleading here. </div>

              </div>

              <br>

              <div class="gmail_quote">

                <div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023,

                  11:40 Niclas Götting <<a

                    href="mailto:ngoetting@itp.uni-bremen.de"

                    target="_blank" rel="noreferrer"

                    moz-do-not-send="true" class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>

                  wrote:<br>

                </div>

                <blockquote class="gmail_quote"

style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <div>

                    <p>Thank you both for the very quick answer!</p>

                    <p>So far, I compiled PETSc with debugging turned

                      on, but I think it should still be faster than

                      standard scipy in both cases. Actually, Stefano's

                      answer has got me very far already; now I only

                      define the RHS of the ODE and no Jacobian (I

                      wonder, why the documentation suggests otherwise,

                      though). I had the following four tries at

                      implementing the RHS:</p>

                    <ol>

                      <li><font face="monospace">def rhsfunc1(ts, t, u,

                          F):<br>

                              scale = 0.5 * (5 < t < 10)<br>

                              (l + scale * pump).mult(u, F)</font></li>

                      <li><font face="monospace">def rhsfunc2(ts, t, u,

                          F):<br>

                              l.mult(u, F)<br>

                              scale = 0.5 * (5 < t < 10)<br>

                              (scale * pump).multAdd(u, F, F)</font></li>

                      <li><font face="monospace">def rhsfunc3(ts, t, u,

                          F):<br>

                              l.mult(u, F)<br>

                              scale = 0.5 * (5 < t < 10)<br>

                              if scale != 0:<br>

                                  pump.scale(scale)<br>

                                  pump.multAdd(u, F, F)<br>

                                  pump.scale(1/scale)</font></li>

                      <li><font face="monospace">def rhsfunc4(ts, t, u,

                          F):<br>

                              tmp_pump.zeroEntries() # tmp_pump is

                          pump.duplicate()<br>

                              l.mult(u, F)<br>

                              scale = 0.5 * (5 < t < 10)<br>

                              tmp_pump.axpy(scale, pump,

                          structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)<br>

                              tmp_pump.multAdd(u, F, F)<br>

                        </font></li>

                    </ol>

                    <p>They all yield the same results, but with 50it/s,

                      800it/, 2300it/s and 1900it/s, respectively, which

                      is a huge performance boost (almost 7 times as

                      fast as scipy, with PETSc debugging still turned

                      on). As the scale function will most likely be a

                      gaussian in the future, I think that option 3 will

                      be become numerically unstable and I'll have to go

                      with option 4, which is already faster than I

                      expected. If you think it is possible to speed up

                      the RHS calculation even more, I'd be happy to

                      hear your suggestions; the -log_view is attached

                      to this message.</p>

                    <p>One last point: If I didn't misunderstand the

                      documentation at <a

href="https://petsc.org/release/manual/ts/#special-cases"

                        rel="noreferrer noreferrer" target="_blank"

                        moz-do-not-send="true"

                        class="moz-txt-link-freetext">https://petsc.org/release/manual/ts/#special-cases</a>,

                      should this maybe be changed?</p>

                    <p>Best regards<br>

                      Niclas<br>

                    </p>

                    <div>On 09.08.23 17:51, Stefano Zampini wrote:<br>

                    </div>

                    <blockquote type="cite">

                      <div dir="auto">

                        <div>TSRK is an explicit solver. Unless you are

                          changing the ts type from command line,  the

                          explicit  jacobian should not be needed. On

                          top of Barry's suggestion, I would suggest you

                          to write the explicit RHS instead of assembly

                          a throw away matrix every time that function

                          needs to be sampled.<br>

                          <br>

                          <div class="gmail_quote">

                            <div dir="ltr" class="gmail_attr">On Wed,

                              Aug 9, 2023, 17:09 Niclas Götting <<a

href="mailto:ngoetting@itp.uni-bremen.de" rel="noreferrer noreferrer"

                                target="_blank" moz-do-not-send="true"

                                class="moz-txt-link-freetext">ngoetting@itp.uni-bremen.de</a>>

                              wrote:<br>

                            </div>

                            <blockquote class="gmail_quote"

style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi

                              all,<br>

                              <br>

                              I'm currently trying to convert a quantum

                              simulation from scipy to <br>

                              PETSc. The problem itself is extremely

                              simple and of the form \dot{u}(t) <br>

                              = (A_const + f(t)*B_const)*u(t), where

                              f(t) in this simple test case is <br>

                              a square function. The matrices A_const

                              and B_const are extremely sparse <br>

                              and therefore I thought, the problem will

                              be well suited for PETSc. <br>

                              Currently, I solve the ODE with the

                              following procedure in scipy (I can <br>

                              provide the necessary data files, if

                              needed, but they are just some <br>

                              trace-preserving, very sparse matrices):<br>

                              <br>

                              import numpy as np<br>

                              import scipy.sparse<br>

                              import scipy.integrate<br>

                              <br>

                              from tqdm import tqdm<br>

                              <br>

                              <br>

                              l = np.load("../liouvillian.npy")<br>

                              pump = np.load("../pump_operator.npy")<br>

                              state = np.load("../initial_state.npy")<br>

                              <br>

                              l = scipy.sparse.csr_array(l)<br>

                              pump = scipy.sparse.csr_array(pump)<br>

                              <br>

                              def f(t, y, *args):<br>

                                   return (l + 0.5 * (5 < t < 10)

                              * pump) @ y<br>

                                   #return l @ y # Uncomment for f(t) =

                              0<br>

                              <br>

                              dt = 0.1<br>

                              NUM_STEPS = 200<br>

                              res = np.empty((NUM_STEPS, 4096),

                              dtype=np.complex128)<br>

                              solver = <br>

scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)<br>

                              times = []<br>

                              for i in tqdm(range(NUM_STEPS)):<br>

                                   res[i, :] = solver.integrate(solver.t

                              + dt)<br>

                                   times.append(solver.t)<br>

                              <br>

                              Here, A_const = l, B_const = pump and f(t)

                              = 5 < t < 10. tqdm reports <br>

                              about 330it/s on my machine. When

                              converting the code to PETSc, I came <br>

                              to the following result (according to the

                              chapter <br>

                              <a

href="https://petsc.org/main/manual/ts/#special-cases"

rel="noreferrer noreferrer noreferrer noreferrer" target="_blank"

                                moz-do-not-send="true"

                                class="moz-txt-link-freetext">https://petsc.org/main/manual/ts/#special-cases</a>)<br>

                              <br>

                              import sys<br>

                              import petsc4py<br>

                              petsc4py.init(args=sys.argv)<br>

                              import numpy as np<br>

                              import scipy.sparse<br>

                              <br>

                              from tqdm import tqdm<br>

                              from petsc4py import PETSc<br>

                              <br>

                              comm = PETSc.COMM_WORLD<br>

                              <br>

                              <br>

                              def mat_to_real(arr):<br>

                                   return np.block([[arr.real,

                              -arr.imag], [arr.imag, <br>

                              arr.real]]).astype(np.float64)<br>

                              <br>

                              def mat_to_petsc_aij(arr):<br>

                                   arr_sc_sp =

                              scipy.sparse.csr_array(arr)<br>

                                   mat =

                              PETSc.Mat().createAIJ(arr.shape[0],

                              comm=comm)<br>

                                   rstart, rend =

                              mat.getOwnershipRange()<br>

                                   print(rstart, rend)<br>

                                   print(arr.shape[0])<br>

                                   print(mat.sizes)<br>

                                   I = arr_sc_sp.indptr[rstart : rend +

                              1] - arr_sc_sp.indptr[rstart]<br>

                                   J =

                              arr_sc_sp.indices[arr_sc_sp.indptr[rstart]

                              : <br>

                              arr_sc_sp.indptr[rend]]<br>

                                   V =

                              arr_sc_sp.data[arr_sc_sp.indptr[rstart] :

                              arr_sc_sp.indptr[rend]]<br>

                              <br>

                                   print(I.shape, J.shape, V.shape)<br>

                                   mat.setValuesCSR(I, J, V)<br>

                                   mat.assemble()<br>

                                   return mat<br>

                              <br>

                              <br>

                              l = np.load("../liouvillian.npy")<br>

                              l = mat_to_real(l)<br>

                              pump = np.load("../pump_operator.npy")<br>

                              pump = mat_to_real(pump)<br>

                              state = np.load("../initial_state.npy")<br>

                              state = np.hstack([state.real,

                              state.imag]).astype(np.float64)<br>

                              <br>

                              l = mat_to_petsc_aij(l)<br>

                              pump = mat_to_petsc_aij(pump)<br>

                              <br>

                              <br>

                              jac = l.duplicate()<br>

                              for i in range(8192):<br>

                                   jac.setValue(i, i, 0)<br>

                              jac.assemble()<br>

                              jac += l<br>

                              <br>

                              vec = l.createVecRight()<br>

                              vec.setValues(np.arange(state.shape[0],

                              dtype=np.int32), state)<br>

                              vec.assemble()<br>

                              <br>

                              <br>

                              dt = 0.1<br>

                              <br>

                              ts = PETSc.TS().create(comm=comm)<br>

                              ts.setFromOptions()<br>

                              ts.setProblemType(ts.ProblemType.LINEAR)<br>

ts.setEquationType(ts.EquationType.ODE_EXPLICIT)<br>

                              ts.setType(ts.Type.RK)<br>

                              ts.setRKType(ts.RKType.RK3BS)<br>

                              ts.setTime(0)<br>

                              print("KSP:", ts.getKSP().getType())<br>

                              print("KSP

                              PC:",ts.getKSP().getPC().getType())<br>

                              print("SNES :", ts.getSNES().getType())<br>

                              <br>

                              def jacobian(ts, t, u, Amat, Pmat):<br>

                                   Amat.zeroEntries()<br>

                                   Amat.aypx(1, l,

                              structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>

                                   Amat.axpy(0.5 * (5 < t < 10),

                              pump, <br>

structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>

                              <br>

ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)<br>

#ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # <br>

                              Uncomment for f(t) = 0<br>

                              ts.setRHSJacobian(jacobian, jac)<br>

                              <br>

                              NUM_STEPS = 200<br>

                              res = np.empty((NUM_STEPS, 8192),

                              dtype=np.float64)<br>

                              times = []<br>

                              rstart, rend = vec.getOwnershipRange()<br>

                              for i in tqdm(range(NUM_STEPS)):<br>

                                   time = ts.getTime()<br>

                                   ts.setMaxTime(time + dt)<br>

                                   ts.solve(vec)<br>

                                   res[i, rstart:rend] =

                              vec.getArray()[:]<br>

                                   times.append(time)<br>

                              <br>

                              I decomposed the complex ODE into a larger

                              real ODE, so that I can <br>

                              easily switch maybe to GPU computation

                              later on. Now, the solutions of <br>

                              both scripts are very much identical, but

                              PETSc runs about 3 times <br>

                              slower at 120it/s on my machine. I don't

                              use MPI for PETSc yet.<br>

                              <br>

                              I strongly suppose that the problem lies

                              within the jacobian definition, <br>

                              as PETSc is about 3 times *faster* than

                              scipy with f(t) = 0 and <br>

                              therefore a constant jacobian.<br>

                              <br>

                              Thank you in advance.<br>

                              <br>

                              All the best,<br>

                              Niclas<br>

                              <br>

                              <br>

                            </blockquote>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </div>

                </blockquote>

              </div>

            </blockquote>

          </div>

        </blockquote>

      </div>

    </blockquote>

  </body>

</html>