[petsc-users] Petsc4py Simulation: Mat.axpy() slow

Barry Smith bsmith at petsc.dev
Sat Nov 11 12:22:12 CST 2023



 DIFFERENT_NONZERO_PATTERN will always be slow because it needs to determine the nonzero pattern of the result for each computation.

 SAME_NONZERO_PATTERN can obviously run fast but may not be practical for your problem. 

 SUBSET_NONZERO_PATTERN (which means in A += b*B we know that B has NO nonzero locations that A does not have) does not need to reallocate anything in A so it can be reasonably fast (we can do more optimizations on the current code to improve it for you).

  So, can you construct the S nonzero structure initially to have the union of the nonzero structures of all the matrices that accumulate into it? This is the same
nonzero structure that it "ends up with" in the current code. With this nonzero structure, SUBSET_NONZERO_PATTERN can be used. Depending on how sparse all the matrices that get accumulated into S are, you could perhaps go further and make sure they all have the same nonzero structure (sure, it uses more memory and will do extra operations, but the actual computation will be so fast it may be worth the extra computations.

  Barry



> On Nov 10, 2023, at 9:53 PM, Donald Rex Planalp <Donald.Planalp at colorado.edu> wrote:
> 
> Hello,
> 
> I am trying to use petsc4py to conduct a quantum mechanics simulation. I've been able to construct all of the relevant matrices, however I am reaching a gigantic bottleneck. 
> 
> For the simplest problem I am running I have a few matrices each about 5000x5000. In order to begin time propagation I need to add these matrices together. However, on 6 cores of my local machine it is taking approximately 1-2 seconds per addition. Since I need to do this for each time step in my simulation it is prohibitively slow since there could be upwards of 10K time steps. 
> 
> Below is the relevant code:
> 
> structure = structure=PETSc.Mat.Structure.DIFFERENT_NONZERO_PATTERN
>     if test2:
>         def makeLeft(S,MIX,ANG,ATOM,i):
>             
> 
>             S.axpy(-Field.pulse[i],MIX,structure)
>             S.axpy(-Field.pulse[i],ANG,structure)
>             S.axpy(-1,ATOM,structure)
>             return S
>         def makeRight(S,MIX,ANG,ATOM,i):
>             
>             
>             
>             S.axpy(Field.pulse[i],MIX,structure)
>             S.axpy(Field.pulse[i],ANG,structure)
>             S.axpy(1,ATOM,structure)
>             
>             return S
> 
>         H_mix = Int.H_mix
>         H_mix.scale(1j * dt /2)
>   
>         H_ang = Int.H_ang
>         H_ang.scale(1j * dt /2)
> 
>         H_atom = Int.H_atom
>         H_atom.scale(1j * dt /2)
> 
>         S = Int.S_total
> 
>         psi_initial = psi.psi_initial.copy()
>         ksp = PETSc.KSP().create(PETSc.COMM_WORLD)
> 
>    
>         for i,t in enumerate(box.t):
>             print(i,L)
> 
>             
> 
>             O_L = makeLeft(S,H_mix,H_ang,H_atom,i)
>             O_R = makeRight(S,H_mix,H_ang,H_atom,i)
> 
>             
> 
>             if i == 0:
>                 known = O_R.getVecRight()
>                 sol = O_L.getVecRight()
>         
>             O_R.mult(psi_initial,known)
> 
>             ksp.setOperators(O_L)
>         
>             
>             ksp.solve(known,sol)
> 
>             
>             
>     
>             psi_initial.copy(sol)
> 
> 
> I need to clean it up a bit, but the main point is that I need to add the matrices many times for a single time step. I can't preallocate memory very well since some of the matrices aren't the most sparse either. It seems if I cannot speed up the addition it will be difficult to continue so I was wondering if you had any insights.
> 
> Thank you for your time

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231111/7a4c23eb/attachment.html>


More information about the petsc-users mailing list