[petsc-users] Strange strong scaling result
Matthew Knepley
knepley at gmail.com
Tue Jul 12 06:05:52 CDT 2022
On Tue, Jul 12, 2022 at 1:50 AM Ce Qin <qince168 at gmail.com> wrote:
> Thanks for your quick response.
>
> The linear system is complex-valued. We rewrite it into its real form
> and solve it using FGMRES and an optimal block-diagonal preconditioner.
> We use CG and the AMS preconditioner implemented in HYPRE to solve the
> smaller real linear system arised from applying the block preconditioner.
> The iteration number of FGMRES and CG keep almost constant in all the runs.
>
So those blocks decrease in size as you add more processes?
> Each node is equipped with a 64-core CPU and 128 GB of memory.
> The matrix-vector production is memory-bandwidth limited. Is this strange
> behavior
> related to memory bandwidth?
>
I don't see how.
Thanks,
Matt
> Best,
> Ce
>
> Mark Adams <mfadams at lbl.gov> 于2022年7月12日周二 04:04写道:
>
>> Also, cache effects. 20M DoFs on one core/thread is huge.
>> 37x on assembly is probably cache effects.
>>
>> On Mon, Jul 11, 2022 at 1:09 PM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Mon, Jul 11, 2022 at 10:34 AM Ce Qin <qince168 at gmail.com> wrote:
>>>
>>>> Dear all,
>>>>
>>>> I want to analyze the strong scaling of our in-house FEM code.
>>>> The test problem has about 20M DoFs. I ran the problem using
>>>> various settings. The speedups for the assembly and solving
>>>> procedures are as follows:
>>>> Assembly Solving
>>>> NProcessors NNodes CoresPerNode
>>>> 1 1 1 1.0 1.0
>>>> 2 1 2 1.995246 1.898756
>>>> 2 1 2.121401 2.436149
>>>> 4 1 4 4.658187 6.004539
>>>> 2 2 4.666667 5.942085
>>>> 4 1 4.65272 6.101214
>>>> 8 2 4 9.380985 16.581135
>>>> 4 2 9.308575 17.258891
>>>> 8 1 9.314449 17.380612
>>>> 16 2 8 18.575953 34.483058
>>>> 4 4 18.745129 34.854409
>>>> 8 2 18.828393 36.45509
>>>> 32 4 8 37.140626 70.175879
>>>> 8 4 37.166421 71.533865
>>>>
>>>> I don't quite understand this result. Why we can achieve a speedup of
>>>> about 70+ using 32 processors? Could you please help me explain this?
>>>>
>>>
>>> We need more data. I would start with the number of iterates that the
>>> solver
>>> executes. I suspect this is changing. However, it can be more
>>> complicated.
>>> For example, a Block-Jacobi preconditioner gets cheaper as the number of
>>> subdomains increases. Thus we need to know exactly what the solver is
>>> doing.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>
>>>> Thank you in advance.
>>>>
>>>> Best,
>>>> Ce
>>>>
>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220712/62dc7ca6/attachment-0001.html>
More information about the petsc-users
mailing list