[petsc-users] Scaling problem when cores > 600

Smith, Barry F. bsmith at mcs.anl.gov
Sun Mar 4 21:43:49 CST 2018

```360 process

KSPSolve              99 1.0 2.6403e+02 1.0 6.67e+10 1.1 2.7e+05 9.9e+05 5.1e+02 15100 17 42 19  15100 17 42 19 87401

1920 processes

KSPSolve              99 1.0 2.3184e+01 1.0 1.32e+10 1.2 1.5e+06 4.3e+05 5.1e+02  4100 17 42 19   4100 17 42 19 967717

Ratio of number of processes 5.33 ratio of time for KSPSolve  11.388 so the time for the solve is scaling very well (extremely well actually). The problem is
due to "other time" that is not in KSP solve. Note that the percentage of the total time in KSPSolve went from 15 percent of the runtime to 4 percent. This means something outside of KSPSolve is scaling very poorly. You will need to profile the rest of the code to determine where the time is being spent. PetscLogEventRegister()  and PetscLogEventBegin/End() will be needed in your code. Already with 360 processes the linear solver is only taking 15 percent of the time.

Barry

> On Mar 4, 2018, at 9:23 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>
>
> On 1/3/2018 12:14 PM, Smith, Barry F. wrote:
>>
>>> On Feb 28, 2018, at 8:01 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>
>>>
>>> On 1/3/2018 12:10 AM, Matthew Knepley wrote:
>>>> On Wed, Feb 28, 2018 at 10:45 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I have a CFD code which uses PETSc and HYPRE. I found that for a certain case with grid size of 192,570,048, I encounter scaling problem when my cores > 600. At 600 cores, the code took 10min for 100 time steps. At 960, 1440 and 2880 cores, it still takes around 10min. At 360 cores, it took 15min.
>>>>
>>>> So how can I find the bottleneck? Any recommended steps?
>>>>
>>>> For any performance question, we need to see the output of -log_view for all test cases.
>>> Hi,
>>>
>>> To be more specific, I use PETSc KSPBCGS and HYPRE geometric multigrid (entirely based on HYPRE, no PETSc) for the momentum and Poisson eqns in my code.
>>>
>>> So can log_view be used in this case to give a meaningful? Since part of the code uses HYPRE?
>>   Yes, just send the logs.
>>
> Hi,
>
> I have attached the logs, with the number indicating the no. of cores used. Some of the new results are different from the previous runs, although I'm using the same cluster.
>
> Thanks for the help.
>>> I also program another subroutine in the past which uses PETSc to solve the Poisson eqn. It uses either HYPRE's boomeramg, KSPBCGS or KSPGMRES.
>>>
>>> If I use boomeramg, can log_view be used in this case?
>>>
>>> Or do I have to use KSPBCGS or KSPGMRES, which is directly from PETSc? However, I ran KSPGMRES yesterday with the Poisson eqn and my ans didn't converge.
>>>
>>> Thanks.
>>>>  I must also mention that I partition my grid only in the x and y direction. There is no partitioning in the z direction due to limited code development. I wonder if there is a strong effect in this case.
>>>>
>>>> Maybe. Usually what happens is you fill up memory with a z-column and cannot scale further.
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>> --
>>>> Thank you very much
>>>>
>>>> Yours sincerely,
>>>>
>>>> ================================================
>>>> TAY Wee-Beng 郑伟明 (Zheng Weiming)
>>>> Personal research webpage: http://tayweebeng.wixsite.com/website
>>>> ================================================
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>
> <log960.txt><log600.txt><log360.txt><log1920.txt>

```