<div dir="auto">Then just do the multiplications you need. My proposal was for the example function you were showing.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023, 12:25 Niclas Götting <<a href="mailto:ngoetting@itp.uni-bremen.de">ngoetting@itp.uni-bremen.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><u></u>
<div>
<p>You are absolutely right for this specific case (I get about
2400it/s instead of 2100it/s). However, the single square function
will be replaced by a series of gaussian pulses in the future,
which will never be zero. Maybe one could do an approximation and
skip the second mult, if the gaussians are close to zero.</p>
<div>On 10.08.23 12:16, Stefano Zampini
wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">If you do the mult of "pump" inside an if it
should be faster </div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023, 12:12
Niclas Götting <<a href="mailto:ngoetting@itp.uni-bremen.de" target="_blank" rel="noreferrer">ngoetting@itp.uni-bremen.de</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<p>If I understood you right, this should be the resulting
RHS:</p>
<p><font face="monospace">def rhsfunc5(ts, t, u, F):<br>
l.mult(u, F)<br>
pump.mult(u, tmp_vec)<br>
scale = 0.5 * (5 < t < 10)<br>
F.axpy(scale, tmp_vec)</font></p>
<p>It is a little bit slower than option 3, but with about
2100it/s consistently ~10% faster than option 4.</p>
<p>Thank you very much for the suggestion!<br>
</p>
<div>On 10.08.23 11:47, Stefano Zampini wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">I would use option 3. Keep a work vector
and do a vector summation instead of the multiple
multiplication by scale and 1/scale.
<div dir="auto"><br>
</div>
<div dir="auto">I agree with you the docs are a little
misleading here. </div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Aug 10, 2023,
11:40 Niclas Götting <<a href="mailto:ngoetting@itp.uni-bremen.de" rel="noreferrer noreferrer" target="_blank">ngoetting@itp.uni-bremen.de</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<p>Thank you both for the very quick answer!</p>
<p>So far, I compiled PETSc with debugging turned
on, but I think it should still be faster than
standard scipy in both cases. Actually, Stefano's
answer has got me very far already; now I only
define the RHS of the ODE and no Jacobian (I
wonder, why the documentation suggests otherwise,
though). I had the following four tries at
implementing the RHS:</p>
<ol>
<li><font face="monospace">def rhsfunc1(ts, t, u,
F):<br>
scale = 0.5 * (5 < t < 10)<br>
(l + scale * pump).mult(u, F)</font></li>
<li><font face="monospace">def rhsfunc2(ts, t, u,
F):<br>
l.mult(u, F)<br>
scale = 0.5 * (5 < t < 10)<br>
(scale * pump).multAdd(u, F, F)</font></li>
<li><font face="monospace">def rhsfunc3(ts, t, u,
F):<br>
l.mult(u, F)<br>
scale = 0.5 * (5 < t < 10)<br>
if scale != 0:<br>
pump.scale(scale)<br>
pump.multAdd(u, F, F)<br>
pump.scale(1/scale)</font></li>
<li><font face="monospace">def rhsfunc4(ts, t, u,
F):<br>
tmp_pump.zeroEntries() # tmp_pump is
pump.duplicate()<br>
l.mult(u, F)<br>
scale = 0.5 * (5 < t < 10)<br>
tmp_pump.axpy(scale, pump,
structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)<br>
tmp_pump.multAdd(u, F, F)<br>
</font></li>
</ol>
<p>They all yield the same results, but with 50it/s,
800it/, 2300it/s and 1900it/s, respectively, which
is a huge performance boost (almost 7 times as
fast as scipy, with PETSc debugging still turned
on). As the scale function will most likely be a
gaussian in the future, I think that option 3 will
be become numerically unstable and I'll have to go
with option 4, which is already faster than I
expected. If you think it is possible to speed up
the RHS calculation even more, I'd be happy to
hear your suggestions; the -log_view is attached
to this message.</p>
<p>One last point: If I didn't misunderstand the
documentation at <a href="https://petsc.org/release/manual/ts/#special-cases" rel="noreferrer noreferrer noreferrer" target="_blank">https://petsc.org/release/manual/ts/#special-cases</a>,
should this maybe be changed?</p>
<p>Best regards<br>
Niclas<br>
</p>
<div>On 09.08.23 17:51, Stefano Zampini wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">
<div>TSRK is an explicit solver. Unless you are
changing the ts type from command line, the
explicit jacobian should not be needed. On
top of Barry's suggestion, I would suggest you
to write the explicit RHS instead of assembly
a throw away matrix every time that function
needs to be sampled.<br>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed,
Aug 9, 2023, 17:09 Niclas Götting <<a href="mailto:ngoetting@itp.uni-bremen.de" rel="noreferrer noreferrer noreferrer" target="_blank">ngoetting@itp.uni-bremen.de</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi
all,<br>
<br>
I'm currently trying to convert a quantum
simulation from scipy to <br>
PETSc. The problem itself is extremely
simple and of the form \dot{u}(t) <br>
= (A_const + f(t)*B_const)*u(t), where
f(t) in this simple test case is <br>
a square function. The matrices A_const
and B_const are extremely sparse <br>
and therefore I thought, the problem will
be well suited for PETSc. <br>
Currently, I solve the ODE with the
following procedure in scipy (I can <br>
provide the necessary data files, if
needed, but they are just some <br>
trace-preserving, very sparse matrices):<br>
<br>
import numpy as np<br>
import scipy.sparse<br>
import scipy.integrate<br>
<br>
from tqdm import tqdm<br>
<br>
<br>
l = np.load("../liouvillian.npy")<br>
pump = np.load("../pump_operator.npy")<br>
state = np.load("../initial_state.npy")<br>
<br>
l = scipy.sparse.csr_array(l)<br>
pump = scipy.sparse.csr_array(pump)<br>
<br>
def f(t, y, *args):<br>
return (l + 0.5 * (5 < t < 10)
* pump) @ y<br>
#return l @ y # Uncomment for f(t) =
0<br>
<br>
dt = 0.1<br>
NUM_STEPS = 200<br>
res = np.empty((NUM_STEPS, 4096),
dtype=np.complex128)<br>
solver = <br>
scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)<br>
times = []<br>
for i in tqdm(range(NUM_STEPS)):<br>
res[i, :] = solver.integrate(solver.t
+ dt)<br>
times.append(solver.t)<br>
<br>
Here, A_const = l, B_const = pump and f(t)
= 5 < t < 10. tqdm reports <br>
about 330it/s on my machine. When
converting the code to PETSc, I came <br>
to the following result (according to the
chapter <br>
<a href="https://petsc.org/main/manual/ts/#special-cases" rel="noreferrer noreferrer noreferrer noreferrer noreferrer" target="_blank">https://petsc.org/main/manual/ts/#special-cases</a>)<br>
<br>
import sys<br>
import petsc4py<br>
petsc4py.init(args=sys.argv)<br>
import numpy as np<br>
import scipy.sparse<br>
<br>
from tqdm import tqdm<br>
from petsc4py import PETSc<br>
<br>
comm = PETSc.COMM_WORLD<br>
<br>
<br>
def mat_to_real(arr):<br>
return np.block([[arr.real,
-arr.imag], [arr.imag, <br>
arr.real]]).astype(np.float64)<br>
<br>
def mat_to_petsc_aij(arr):<br>
arr_sc_sp =
scipy.sparse.csr_array(arr)<br>
mat =
PETSc.Mat().createAIJ(arr.shape[0],
comm=comm)<br>
rstart, rend =
mat.getOwnershipRange()<br>
print(rstart, rend)<br>
print(arr.shape[0])<br>
print(mat.sizes)<br>
I = arr_sc_sp.indptr[rstart : rend +
1] - arr_sc_sp.indptr[rstart]<br>
J =
arr_sc_sp.indices[arr_sc_sp.indptr[rstart]
: <br>
arr_sc_sp.indptr[rend]]<br>
V =
arr_sc_sp.data[arr_sc_sp.indptr[rstart] :
arr_sc_sp.indptr[rend]]<br>
<br>
print(I.shape, J.shape, V.shape)<br>
mat.setValuesCSR(I, J, V)<br>
mat.assemble()<br>
return mat<br>
<br>
<br>
l = np.load("../liouvillian.npy")<br>
l = mat_to_real(l)<br>
pump = np.load("../pump_operator.npy")<br>
pump = mat_to_real(pump)<br>
state = np.load("../initial_state.npy")<br>
state = np.hstack([state.real,
state.imag]).astype(np.float64)<br>
<br>
l = mat_to_petsc_aij(l)<br>
pump = mat_to_petsc_aij(pump)<br>
<br>
<br>
jac = l.duplicate()<br>
for i in range(8192):<br>
jac.setValue(i, i, 0)<br>
jac.assemble()<br>
jac += l<br>
<br>
vec = l.createVecRight()<br>
vec.setValues(np.arange(state.shape[0],
dtype=np.int32), state)<br>
vec.assemble()<br>
<br>
<br>
dt = 0.1<br>
<br>
ts = PETSc.TS().create(comm=comm)<br>
ts.setFromOptions()<br>
ts.setProblemType(ts.ProblemType.LINEAR)<br>
ts.setEquationType(ts.EquationType.ODE_EXPLICIT)<br>
ts.setType(ts.Type.RK)<br>
ts.setRKType(ts.RKType.RK3BS)<br>
ts.setTime(0)<br>
print("KSP:", ts.getKSP().getType())<br>
print("KSP
PC:",ts.getKSP().getPC().getType())<br>
print("SNES :", ts.getSNES().getType())<br>
<br>
def jacobian(ts, t, u, Amat, Pmat):<br>
Amat.zeroEntries()<br>
Amat.aypx(1, l,
structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>
Amat.axpy(0.5 * (5 < t < 10),
pump, <br>
structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)<br>
<br>
ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)<br>
#ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # <br>
Uncomment for f(t) = 0<br>
ts.setRHSJacobian(jacobian, jac)<br>
<br>
NUM_STEPS = 200<br>
res = np.empty((NUM_STEPS, 8192),
dtype=np.float64)<br>
times = []<br>
rstart, rend = vec.getOwnershipRange()<br>
for i in tqdm(range(NUM_STEPS)):<br>
time = ts.getTime()<br>
ts.setMaxTime(time + dt)<br>
ts.solve(vec)<br>
res[i, rstart:rend] =
vec.getArray()[:]<br>
times.append(time)<br>
<br>
I decomposed the complex ODE into a larger
real ODE, so that I can <br>
easily switch maybe to GPU computation
later on. Now, the solutions of <br>
both scripts are very much identical, but
PETSc runs about 3 times <br>
slower at 120it/s on my machine. I don't
use MPI for PETSc yet.<br>
<br>
I strongly suppose that the problem lies
within the jacobian definition, <br>
as PETSc is about 3 times *faster* than
scipy with f(t) = 0 and <br>
therefore a constant jacobian.<br>
<br>
Thank you in advance.<br>
<br>
All the best,<br>
Niclas<br>
<br>
<br>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div>