[petsc-users] Scaling problem when cores > 600

Sun Apr 22 11:53:22 CDT 2018

   KSPSolve would include all the KSP solves

   You can use PetscLogStageRegister(), PetscLogStagePush(), PetscLogStagePop(), to have
the information about each solve displayed separately.

   Barry

> On Apr 22, 2018, at 11:42 AM, TAY wee-beng <zonexo at gmail.com> wrote:
> 
> 
> On 22/4/2018 5:22 AM, Smith, Barry F. wrote:
>>    Comparing the time in KSPSolve()
>> 
>>>>> 1.8116e+02/3.5276e+01
>> 5.135502891484295
>> 
>> to the increase in number of processes one sees
>> 
>>>>> 1440/288.
>> 5.0
>> 
>> so very very good speedup. So the linear solver is working very well.
>> 
>>  The problem is the rest of the code. On 288 processes the code is spending
>> 72 percent of the runtime in KSPSolve while on 1440 processes it is spending 33 percent of the time. This means something OUTSIDE of the linear solver is not scaling well. No way for me to know where the rest of the time is, could be in your code reading in meshes, or in your code partitioning or something else. You might use PetscLogEventRegister() and PetscLogEventBegin/End() to profile your own code and see where it is spending so much time.
>> 
>>    Barry
> Hi,
> 
> Just to confirm, does the KSPSolve represents both the momentum and the Poisson eqn?
> 
> I have 2 kspsolve, with the momentum eqn having:
> 
> call KSPSetOptionsPrefix(ksp_semi_xyz,"momentum_",ierr)
> 
> Poisson eqn having:
> 
> call KSPSetOptionsPrefix(ksp,"poisson_",ierr)
> 
> Is there anyway to separate their efficiency?
> 
> Thanks.
> 
> 
>> 
>> 
>>> On Apr 21, 2018, at 10:34 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I have found some time to work on this scaling problem again. I am now using:
>>> 
>>> mpirun ./a.out -log_view -poisson_pc_type gamg -poisson_pc_gamg_agg_nsmooths 1
>>> 
>>> I have attached the log_view output for 288, 600, 960, 1440 procs for comparison.
>>> 
>>> Please give some comments.
>>> 
>>> 
>>> Thank you very much
>>> 
>>> Yours sincerely,
>>> 
>>> ================================================
>>> TAY Wee-Beng 郑伟明 (Zheng Weiming)
>>> Personal research webpage: http://tayweebeng.wixsite.com/website
>>> Youtube research showcase: https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
>>> linkedin: www.linkedin.com/in/tay-weebeng
>>> ================================================
>>> 
>>> On 7/3/2018 11:58 PM, Smith, Barry F. wrote:
>>>>    What are you using for Poisson log.
>>>> 
>>>>    If it is a Poisson problem then almost for sure you should be using Hypre BoomerAMG?.
>>>> 
>>>>    It sounds like your matrix does not change. You will need to discuss the scaling with the hypre people.
>>>> 
>>>>    Barry
>>>> 
>>>> 
>>>>> On Mar 7, 2018, at 5:38 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> On 7/3/2018 6:22 AM, Smith, Barry F. wrote:
>>>>>>    The speed up for "Poisson log" is 1.6425364214878704 = 5.0848e+02/3.0957e+02
>>>>>> 
>>>>>>     This is lower than I would expect for Hypre BoomerAMG?
>>>>>> 
>>>>>>     Are you doing multiple solves with the same matrix with hypre or is each solve a new matrix? If each solve is a new matrix then you may be getting expected behavior since the multigrid AMG construction process does not scale as well as the application of AMG once it is constructed.
>>>>>> 
>>>>>>     I am forwarding to the hypre team since this is their expertise not ours
>>>>>> 
>>>>>>    Barry
>>>>>> 
>>>>> Hi,
>>>>> 
>>>>> My LHS of the eqn does not change. Only the RHS changes at each time step. So should this be expected?
>>>>> 
>>>>> So maybe I should change to BoomerAMG and compare?
>>>>> 
>>>>> Will PETSc GAMG give better performance?
>>>>> 
>>>>> Also, I must add that I only partition in the x and y direction. Will this be a factor?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>>>> On Mar 5, 2018, at 11:19 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> On 5/3/2018 11:43 AM, Smith, Barry F. wrote:
>>>>>>>> 360 process
>>>>>>>> 
>>>>>>>> KSPSolve              99 1.0 2.6403e+02 1.0 6.67e+10 1.1 2.7e+05 9.9e+05 5.1e+02 15100 17 42 19  15100 17 42 19 87401
>>>>>>>> 
>>>>>>>> 1920 processes
>>>>>>>> 
>>>>>>>> KSPSolve              99 1.0 2.3184e+01 1.0 1.32e+10 1.2 1.5e+06 4.3e+05 5.1e+02  4100 17 42 19   4100 17 42 19 967717
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Ratio of number of processes 5.33 ratio of time for KSPSolve  11.388 so the time for the solve is scaling very well (extremely well actually). The problem is
>>>>>>>> due to "other time" that is not in KSP solve. Note that the percentage of the total time in KSPSolve went from 15 percent of the runtime to 4 percent. This means something outside of KSPSolve is scaling very poorly. You will need to profile the rest of the code to determine where the time is being spent. PetscLogEventRegister()  and PetscLogEventBegin/End() will be needed in your code. Already with 360 processes the linear solver is only taking 15 percent of the time.
>>>>>>>> 
>>>>>>>>   Barry
>>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I have attached the new logging results with the HYPRE Poisson eqn solver. However, due to some problems, I am now using Intel 2018. Should be quite similar to 2016 in terms of runtime. Using 360 processes can't work this time, and I'm not sure why though.
>>>>>>>>> On Mar 4, 2018, at 9:23 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 1/3/2018 12:14 PM, Smith, Barry F. wrote:
>>>>>>>>>>> On Feb 28, 2018, at 8:01 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 1/3/2018 12:10 AM, Matthew Knepley wrote:
>>>>>>>>>>>> On Wed, Feb 28, 2018 at 10:45 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> I have a CFD code which uses PETSc and HYPRE. I found that for a certain case with grid size of 192,570,048, I encounter scaling problem when my cores > 600. At 600 cores, the code took 10min for 100 time steps. At 960, 1440 and 2880 cores, it still takes around 10min. At 360 cores, it took 15min.
>>>>>>>>>>>> 
>>>>>>>>>>>> So how can I find the bottleneck? Any recommended steps?
>>>>>>>>>>>> 
>>>>>>>>>>>> For any performance question, we need to see the output of -log_view for all test cases.
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> To be more specific, I use PETSc KSPBCGS and HYPRE geometric multigrid (entirely based on HYPRE, no PETSc) for the momentum and Poisson eqns in my code.
>>>>>>>>>>> 
>>>>>>>>>>> So can log_view be used in this case to give a meaningful? Since part of the code uses HYPRE?
>>>>>>>>>>   Yes, just send the logs.
>>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I have attached the logs, with the number indicating the no. of cores used. Some of the new results are different from the previous runs, although I'm using the same cluster.
>>>>>>>>> 
>>>>>>>>> Thanks for the help.
>>>>>>>>>>> I also program another subroutine in the past which uses PETSc to solve the Poisson eqn. It uses either HYPRE's boomeramg, KSPBCGS or KSPGMRES.
>>>>>>>>>>> 
>>>>>>>>>>> If I use boomeramg, can log_view be used in this case?
>>>>>>>>>>> 
>>>>>>>>>>> Or do I have to use KSPBCGS or KSPGMRES, which is directly from PETSc? However, I ran KSPGMRES yesterday with the Poisson eqn and my ans didn't converge.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks.
>>>>>>>>>>>>  I must also mention that I partition my grid only in the x and y direction. There is no partitioning in the z direction due to limited code development. I wonder if there is a strong effect in this case.
>>>>>>>>>>>> 
>>>>>>>>>>>> Maybe. Usually what happens is you fill up memory with a z-column and cannot scale further.
>>>>>>>>>>>> 
>>>>>>>>>>>>   Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>>      Matt
>>>>>>>>>>>>  --
>>>>>>>>>>>> Thank you very much
>>>>>>>>>>>> 
>>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>> 
>>>>>>>>>>>> ================================================
>>>>>>>>>>>> TAY Wee-Beng 郑伟明 (Zheng Weiming)
>>>>>>>>>>>> Personal research webpage: http://tayweebeng.wixsite.com/website
>>>>>>>>>>>> Youtube research showcase: https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
>>>>>>>>>>>> linkedin: www.linkedin.com/in/tay-weebeng
>>>>>>>>>>>> ================================================
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> -- 
>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>> 
>>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>>> <log960.txt><log600.txt><log360.txt><log1920.txt>
>>>>>>> <log1920_2.txt><log600_2.txt><log960_2.txt><log1440_2.txt>
>>> <log600.txt><log288.txt><log1440.txt><log960.txt>