[petsc-users] Poor speed up for KSP example 45

Wed Mar 25 12:56:55 CDT 2020

On Wed, Mar 25, 2020 at 1:01 PM Amin Sadeghi <aminthefresh at gmail.com> wrote:

> Hi,
>
> I ran KSP example 45 on a single node with 32 cores and 125GB memory using
> 1, 16 and 32 MPI processes. Here's a comparison of the time spent during
> KSP.solve:
>
> - 1 MPI process: ~98 sec, speedup: 1X
> - 16 MPI processes: ~12 sec, speedup: ~8X
> - 32 MPI processes: ~11 sec, speedup: ~9X
>
> Since the problem size is large enough (8M unknowns), I expected a speedup
> much closer to 32X, rather than 9X. Is this expected? If yes, how can it be
> improved?
>
> I've attached three log files for more details.
>

We have answered this here:
https://www.mcs.anl.gov/petsc/documentation/faq.html#computers

However, I can briefly summarize it. The bottleneck here is not computing
power, it is memory bandwidth. The node
you are running on has enough bandwidth for about 8 processes, not 32. I
probably takes 12-16 processes to saturate
the memory bandwidth, but not 32. That is why you see no speedup after 16.
There is no way to improve this by optimization.
The only thing to do is change the algorithm you are using. This behavior
has been extensively documented and talked about
for two decades. See, for example, the Roofline Performance Model.

  Thanks,

    Matt

> Sincerely,
> Amin
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200325/f49e5c8d/attachment.html>