[petsc-users] about repeat of expensive functions using VecScatterCreateToAll

Venugopal, Vysakh (venugovh) venugovh at mail.uc.edu
Tue Jan 17 15:38:48 CST 2023


Thank you! I am doing a structural optimization filter that inherently cannot be parallelized.

Vysakh

From: Barry Smith <bsmith at petsc.dev>
Sent: Tuesday, January 17, 2023 3:28 PM
To: Venugopal, Vysakh (venugovh) <venugovh at mail.uc.edu>
Cc: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll


External Email: Use Caution





On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hi,

I am doing the following thing.

Step 1. Create DM object and get global vector ‘V’ using DMGetGlobalVector.
Step 2. Doing some parallel operations on V.
Step 3. I am using VecScatterCreateToAll on V to create a sequential vector ‘V_SEQ’ using VecScatterBegin/End with SCATTER_FORWARD.
Step 4. I am performing an expensive operation on V_SEQ and outputting the updated V_SEQ.
Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and sequential flipped) to get V that is updated with new values from V_SEQ.
Step 6. I continue using this new V on the rest of the parallelized program.

Question: Suppose I have n MPI processes, is the expensive operation in Step 4 repeated n times? If yes, is there a workaround such that the operation in Step 4 is performed only once? I would like to follow the same structure as steps 1 to 6 with step 4 only performed once.

  Each MPI rank is doing the same operations on its copy of the sequential vector. Since they are running in parallel it probably does not matter much that each is doing the same computation. Step 5 does not require any MPI since only part of the sequential vector (which everyone has) is needed in the parallel vector.

  You could use VecScatterCreateToZero() but then step 3 would require less communication but step 5 would require communication to get parts of the solution from rank 0 to the other ranks. The time for step 4 would be roughly the same.

  You will likely only see a worthwhile improvement in performance if you can parallelize the computation in 4. What are you doing that is computational intense and requires all the data on a rank?

Barry



Thanks,

Vysakh Venugopal
---
Vysakh Venugopal
Ph.D. Candidate
Department of Mechanical Engineering
University of Cincinnati, Cincinnati, OH 45221-0072

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230117/0f710755/attachment.html>


More information about the petsc-users mailing list