[petsc-users] PETSc parallel scalability

Justin Chang jychang48 at gmail.com
Wed Sep 7 21:09:16 CDT 2016


Lots of topics to discuss here...

- 315,342 unknowns is a very small problem. The PETSc gurus require at
minimum 10,000 unknowns per process
<http://www.mcs.anl.gov/petsc/documentation/faq.html#slowerparallel> for
the computation time to outweigh communication time (although 20,000
unknowns or more is preferable). So when using 32 MPI processes and more,
you're going to have ~10k unknowns or less so that's one reason why you're
going to see less speedup.

- Another reason you get poor parallel scalability is that PETSc is limited
by the memory-bandwidth. Meaning you have to use the optimal number of
cores per compute node or whatever it is you're running on. The PETSc gurus
talk about this issue in depth
<http://www.mcs.anl.gov/petsc/documentation/faq.html#computers>. So not
only do you need proper MPI process bindings, but it is likely that you
will not want to saturate all available cores on a single node (the STREAMS
Benchmark can tell you this). In other words, 16 cores spread across 2
nodes is going to outperform 16 cores on 1 node.

- If operations like MatMult are not scaling, this is likely due to the
memory bandwidth limitations. If operations like VecDot or VecNorm are not
scaling, this is likely due to the network latency between compute nodes.

- What kind of problem are you solving? CG/BJacobi is a mediocre
solver/preconditioner combination, and solver iterations will increase with
MPI processes if your tolerances are too lax. You can try using CG with any
of the multi-grid preconditioners like GAMG if you have something nice like
the poission equation.

- The best way to improve parallel performance is to make your code really
inefficient and crappy.

- And most importantly, always send -log_view if you want people to help
identify performance related issues with your application :)

Justin

On Wed, Sep 7, 2016 at 8:37 PM, Jinlei Shen <jshen25 at jhu.edu> wrote:

> Hi,
>
> I am trying to test the parallel scalablity of iterative solver (CG with
> BJacobi preconditioner) in PETSc.
>
> Since the iteration number increases with more processors, I calculated
> the single iteration time by dividing the total KSPSolve time by number of
> iteration in this test.
>
> The linear system I'm solving has 315342 unknowns. Only KSPSolve cost is
> analyzed.
>
> The results show that the parallelism works well with small number of
> processes (less than 32 in my case), and is almost perfect parallel within
> first 10 processors.
>
> However, the effect of parallelization degrades if I use more processors.
> The wired thing is that with more than 100 processors, the single iteration
> cost is slightly increasing.
>
> To investigate this issue, I then looked into the composition of KSPSolve
> time.
> It seems KSPSolve consists of MatMult, VecTDot(min),VecNorm(min),VecAXPY(max),VecAXPX(max),ApplyPC.
> Please correct me if I'm wrong.
>
> And I found for small number of processors, all these components scale
> well.
> However, using more processors(roughly larger than 40), MatMult,
> VecTDot(min),VecNorm(min) behaves worse, and even increasing after 100
> processors, while the other three parts parallel well even for 1000
> processors.
> Since MatMult composed major cost in single iteration, the total single
> iteration cost increases as well.(See the below figure).
>
> My question:
> 1. Is such situation reasonable? Could anyone explain why MatMult scales
> poor after certain number of processors? I heard some about different
> algorithms for matrix multiplication. Is that the bottleneck?
>
> 2. Is the parallelism dependent of matrix size? If I use larger system
> size,e.g. million , can the solver scale better?
>
> 3. Do you have any idea to improve the parallel performance?
>
> Thank you very much.
>
> JInlei
>
> [image: Inline image 1]
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160907/7c433cfe/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 39477 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160907/7c433cfe/attachment-0001.png>


More information about the petsc-users mailing list