[petsc-users] PETSc parallel scalability

Jinlei Shen jshen25 at jhu.edu
Thu Sep 8 19:45:36 CDT 2016


Hi,

Thanks a lot for the replies. They are really helpful.

I just used the ksp ex2.c as an example to test the parallelization on my
cluster, since the example have flexible number of unknowns.
I do observe that for larger size of the problem, the speed-up shows better
with more processors.

Now, I'm moving on incorporating PETSC into our real CPFEM code, and
investigating the suitable solver and preconditioners for the specific
system.

Thanks again.

Bests,
Jinlei





On Wed, Sep 7, 2016 at 10:26 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Sep 7, 2016, at 8:37 PM, Jinlei Shen <jshen25 at jhu.edu> wrote:
> >
> > Hi,
> >
> > I am trying to test the parallel scalablity of iterative solver (CG with
> BJacobi preconditioner) in PETSc.
> >
> > Since the iteration number increases with more processors, I calculated
> the single iteration time by dividing the total KSPSolve time by number of
> iteration in this test.
> >
> > The linear system I'm solving has 315342 unknowns. Only KSPSolve cost is
> analyzed.
> >
> > The results show that the parallelism works well with small number of
> processes (less than 32 in my case), and is almost perfect parallel within
> first 10 processors.
> >
> > However, the effect of parallelization degrades if I use more
> processors. The wired thing is that with more than 100 processors, the
> single iteration cost is slightly increasing.
> >
> > To investigate this issue, I then looked into the composition of
> KSPSolve time.
> > It seems KSPSolve consists of MatMult, VecTDot(min),VecNorm(min),
> VecAXPY(max),VecAXPX(max),ApplyPC. Please correct me if I'm wrong.
> >
> > And I found for small number of processors, all these components scale
> well.
> > However, using more processors(roughly larger than 40), MatMult,
> VecTDot(min),VecNorm(min) behaves worse, and even increasing after 100
> processors, while the other three parts parallel well even for 1000
> processors.
> > Since MatMult composed major cost in single iteration, the total single
> iteration cost increases as well.(See the below figure).
> >
> > My question:
> > 1. Is such situation reasonable?
>
>    Yes
>
> > Could anyone explain why MatMult scales poor after certain number of
> processors? I heard some about different algorithms for matrix
> multiplication. Is that the bottleneck?
>
>    The MatMult inherently requires communication, as the number of
> processors increases the amount of communication increases while the total
> work needed remains the same. This is true regardless of the particular
> details of the algorithms used.
>
>    Similar computations like norms and inner products require a
> communication among all processes, as the increase the number of processes
> the communication time starts to dominate for norms and inner products
> hence they begin to take a great deal of time for large numbers of
> processes.
> >
> > 2. Is the parallelism dependent of matrix size? If I use larger system
> size,e.g. million , can the solver scale better?
>
>    Absolutely. You should look up the concepts of strong scaling and weak
> scaling. These are important concepts to understand to understand parallel
> computing.
>
> >
> > 3. Do you have any idea to improve the parallel performance?
>
>   Worrying about parallel performance should never be a priority, the
> priorities should be "can I solve the size problems I need to solve in a
> reasonable amount of time to accomplish whatever science or engineering I
> am working on". To be able to do this first depends on using good
> discretization methods for your model (i.e. not just first or second order
> whenever possible) and am I using efficient algebraic solvers when I need
> to solve algebraic systems. (As Justin noted bjacobi GMRES is not a
> particularly efficient algebraic solver). Only after you have done all that
> does improving parallel performance come into play; you know you have an
> efficient discretization and solver and now you want to make it run faster
> in parallel. Since each type of simulation is unique you need to work
> through the process for the problem YOU need to solve, you can't just run
> it for a model problem and then try to reuse the same discretizations and
> solvers for your "real" problem.
>
>    Barry
>
> >
> > Thank you very much.
> >
> > JInlei
> >
> > <image.png>
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160908/191f6b75/attachment.html>


More information about the petsc-users mailing list