[petsc-users] Bad memory scaling with PETSc 3.10

Tue Mar 5 22:35:36 CST 2019

Myriam, in your first message, there was a significant (about 50%)
increase in memory consumption already on 4 cores.  Before attacking
scaling, it may be useful to trace memory usage for that base case.
Even better if you can reduce to one process.  Anyway, I would start by
running both cases with -log_view and looking at the memory summary.  I
would then use Massif (the memory profiler/tracer component in Valgrind)
to obtain stack traces for the large allocations.  Comparing those
traces should help narrow down which part of the code has significantly
different memory allocation behavior.  It might also point to the
unacceptable memory consumption under weak scaling, but it's something
we should try to fix.

If I had to guess, it may be in intermediate data structures for the
different PtAP algorithms in GAMG.  The option "-matptap_via scalable"
may be helpful.

"Smith, Barry F. via petsc-users" <petsc-users at mcs.anl.gov> writes:

>    Myriam,
>
>     Sorry we have not been able to resolve this problem with memory scaling yet.
>
>     The best tool to determine the change in a code that results in large differences in a program's run is git bisect. Basically you tell git bisect 
> the git commit of the code that is "good" and the git commit of the code that is "bad" and it gives you additional git commits for you to check your code on  each time telling git if it is "good" or "bad", eventually git bisect tells you exactly the git commit that "broke" the code. No guess work, no endless speculation. 
>
>     The draw back is that you have to ./configure && make PETSc for each "test" commit and then compile and run your code for that commit. I can understand if you have to run your code on 10,000 processes to check if it is "good" or "bad" that can be very daunting. But all I can suggest is to find a problem size that is manageable and do the git bisect process (yeah it may take several hours but that beats days of head banging).
>
>    Good luck,
>
>    Barry
>
>
>> On Mar 5, 2019, at 12:42 PM, Matthew Knepley via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> 
>> On Tue, Mar 5, 2019 at 11:53 AM Myriam Peyrounette <myriam.peyrounette at idris.fr> wrote:
>> I used PCView to display the size of the linear system in each level of the MG. You'll find the outputs attached to this mail (zip file) for both the default threshold value and a value of 0.1, and for both 3.6 and 3.10 PETSc versions. 
>> 
>> For convenience, I summarized the information in a graph, also attached (png file).
>> 
>> 
>> Great! Can you draw lines for the different runs you did? My interpretation was that memory was increasing
>> as you did larger runs, and that you though that was coming from GAMG. That means the curves should
>> be pushed up for larger runs. Do you see that?
>> 
>>   Thanks,
>> 
>>     Matt 
>> As you can see, there are slight differences between the two versions but none is critical, in my opinion. Do you see anything suspicious in the outputs?
>> 
>> + I can't find the default threshold value. Do you know where I can find it?
>> 
>> Thanks for the follow-up
>> 
>> Myriam
>> 
>> 
>> Le 03/05/19 à 14:06, Matthew Knepley a écrit :
>>> On Tue, Mar 5, 2019 at 7:14 AM Myriam Peyrounette <myriam.peyrounette at idris.fr> wrote:
>>> Hi Matt,
>>> 
>>> I plotted the memory scalings using different threshold values. The two scalings are slightly translated (from -22 to -88 mB) but this gain is neglectable. The 3.6-scaling keeps being robust while the 3.10-scaling deteriorates.
>>> 
>>> Do you have any other suggestion?
>>> 
>>> Mark, what is the option she can give to output all the GAMG data?
>>> 
>>> Also, run using -ksp_view. GAMG will report all the sizes of its grids, so it should be easy to see
>>> if the coarse grid sizes are increasing, and also what the effect of the threshold value is.
>>> 
>>>   Thanks,
>>> 
>>>      Matt 
>>> Thanks
>>> 
>>> Myriam 
>>> 
>>> Le 03/02/19 à 02:27, Matthew Knepley a écrit :
>>>> On Fri, Mar 1, 2019 at 10:53 AM Myriam Peyrounette via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>> Hi,
>>>> 
>>>> I used to run my code with PETSc 3.6. Since I upgraded the PETSc version
>>>> to 3.10, this code has a bad memory scaling.
>>>> 
>>>> To report this issue, I took the PETSc script ex42.c and slightly
>>>> modified it so that the KSP and PC configurations are the same as in my
>>>> code. In particular, I use a "personnalised" multi-grid method. The
>>>> modifications are indicated by the keyword "TopBridge" in the attached
>>>> scripts.
>>>> 
>>>> To plot the memory (weak) scaling, I ran four calculations for each
>>>> script with increasing problem sizes and computations cores:
>>>> 
>>>> 1. 100,000 elts on 4 cores
>>>> 2. 1 million elts on 40 cores
>>>> 3. 10 millions elts on 400 cores
>>>> 4. 100 millions elts on 4,000 cores
>>>> 
>>>> The resulting graph is also attached. The scaling using PETSc 3.10
>>>> clearly deteriorates for large cases, while the one using PETSc 3.6 is
>>>> robust.
>>>> 
>>>> After a few tests, I found that the scaling is mostly sensitive to the
>>>> use of the AMG method for the coarse grid (line 1780 in
>>>> main_ex42_petsc36.cc). In particular, the performance strongly
>>>> deteriorates when commenting lines 1777 to 1790 (in main_ex42_petsc36.cc).
>>>> 
>>>> Do you have any idea of what changed between version 3.6 and version
>>>> 3.10 that may imply such degradation?
>>>> 
>>>> I believe the default values for PCGAMG changed between versions. It sounds like the coarsening rate
>>>> is not great enough, so that these grids are too large. This can be set using:
>>>> 
>>>>   https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCGAMGSetThreshold.html
>>>> 
>>>> There is some explanation of this effect on that page. Let us know if setting this does not correct the situation.
>>>> 
>>>>   Thanks,
>>>> 
>>>>      Matt
>>>>  
>>>> Let me know if you need further information.
>>>> 
>>>> Best,
>>>> 
>>>> Myriam Peyrounette
>>>> 
>>>> 
>>>> -- 
>>>> Myriam Peyrounette
>>>> CNRS/IDRIS - HLST
>>>> --
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> -- Norbert Wiener
>>>> 
>>>> https://www.cse.buffalo.edu/~knepley/
>>> 
>>> -- 
>>> Myriam Peyrounette
>>> CNRS/IDRIS - HLST
>>> --
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/
>> 
>> -- 
>> Myriam Peyrounette
>> CNRS/IDRIS - HLST
>> --
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/