<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Hi all,</p>

    <p> </p>

    <p>thanks a lot for all your suggestions. I followed Jed's advice

      and focused on the smallest case (problem size = 1e5) to find the

      origin of the memory gap. And I ran my jobs on a single core. <br>

    </p>

    <p> </p>

    <p>I first used a homemade script to plot memory and time

      consumption: see mem_consumption.png and time_consumption.png

      attached. The steps displayed correspond to checkpoints I defined

      in the main file (search for keyword "STEP" in the attached main

      file if you need to locate them).</p>

    <p> </p>

    <p>We can see that both the time and the memory needs increase while

      using petsc 3.10. With regard to memory consumption, the memory

      gap (of ~135,000,000B) is not critical yet for such a problem

      size, but will be for larger problems. According to these graphs,

      something clearly happens while calling KSPSolve. The code also

      spends more time building the matrix. <br>

      <style type="text/css"><!-- 

                body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }

                 -->

        </style>

      <style type="text/css"><!-- 

                body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }

                 -->

        </style> </p>

    <p> </p>

    <p>To dig deeper, I printed the LogView outputs

      (logView_petsc3XX.log). In particular, we see that the memory

      distribution among the petsc object types is very different

      depending on the petsc version. I highlighted this in

      log_view_mem_distr.pdf by sorting the petsc object types according

      to their memory use, and computing the difference between petsc

      3.10 and petsc 3.6  (column on the rigth). But I don't know how to

      understand that...<br>

    </p>

    <p> </p>

    <p>Finally, I used Massif (massif_petscXX.out). The total memory gap

      of ~135,000,000B is verified. The outputs further indicate that

      most of the memory space is required by DMCreateMatrix

      (1,174,436,836B). This value is almost the same is both versions

      so I don't think this is the problem.   <br>

    </p>

    <p>You mentioned PtAP: MatPtAPSymbolic_SeqAIJ_SeqMAIJ needs

      56,277,776B with petsc 3.6, and 112,562,000B in petsc 3.10 (twice

      more). So it is a good start but it does not correspond to the

      total memory gap. I will try the option "-matptap via scalable" to

      see if there is an improvement.<br>

    </p>

    <p>I also find a gap at KSPSolve, which needs up to 173,253,112B

      with petsc 3.10 (line 461 in massif_petsc310.out), and no more

      than  146,825,952B (line 1534 in massif_petsc36.out) with petsc

      3.6. But again, the sum of these gaps does not fit the total gap. 

    </p>

    <p>I am not very familiar with massif, so maybe you'll see

      additional relevant information?</p>

    <p>Best,</p>

    <p>Myriam</p>

    <p>

      <style type="text/css"><!-- 

                body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }

                 -->

        </style></p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">Le 03/06/19 à 05:35, Jed Brown a

      écrit :<br>

    </div>

    <blockquote type="cite" cite="mid:87sgw0pqlj.fsf@jedbrown.org">

      <pre wrap="">Myriam, in your first message, there was a significant (about 50%)

increase in memory consumption already on 4 cores.  Before attacking

scaling, it may be useful to trace memory usage for that base case.

Even better if you can reduce to one process.  Anyway, I would start by

running both cases with -log_view and looking at the memory summary.  I

would then use Massif (the memory profiler/tracer component in Valgrind)

to obtain stack traces for the large allocations.  Comparing those

traces should help narrow down which part of the code has significantly

different memory allocation behavior.  It might also point to the

unacceptable memory consumption under weak scaling, but it's something

we should try to fix.

If I had to guess, it may be in intermediate data structures for the

different PtAP algorithms in GAMG.  The option "-matptap_via scalable"

may be helpful.

"Smith, Barry F. via petsc-users" <a class="moz-txt-link-rfc2396E" href="mailto:petsc-users@mcs.anl.gov"><petsc-users@mcs.anl.gov></a> writes:

</pre>

      <blockquote type="cite">

        <pre wrap="">   Myriam,

    Sorry we have not been able to resolve this problem with memory scaling yet.

    The best tool to determine the change in a code that results in large differences in a program's run is git bisect. Basically you tell git bisect 

the git commit of the code that is "good" and the git commit of the code that is "bad" and it gives you additional git commits for you to check your code on  each time telling git if it is "good" or "bad", eventually git bisect tells you exactly the git commit that "broke" the code. No guess work, no endless speculation. 

    The draw back is that you have to ./configure && make PETSc for each "test" commit and then compile and run your code for that commit. I can understand if you have to run your code on 10,000 processes to check if it is "good" or "bad" that can be very daunting. But all I can suggest is to find a problem size that is manageable and do the git bisect process (yeah it may take several hours but that beats days of head banging).

   Good luck,

   Barry

</pre>

        <blockquote type="cite">

          <pre wrap="">On Mar 5, 2019, at 12:42 PM, Matthew Knepley via petsc-users <a class="moz-txt-link-rfc2396E" href="mailto:petsc-users@mcs.anl.gov"><petsc-users@mcs.anl.gov></a> wrote:

On Tue, Mar 5, 2019 at 11:53 AM Myriam Peyrounette <a class="moz-txt-link-rfc2396E" href="mailto:myriam.peyrounette@idris.fr"><myriam.peyrounette@idris.fr></a> wrote:

I used PCView to display the size of the linear system in each level of the MG. You'll find the outputs attached to this mail (zip file) for both the default threshold value and a value of 0.1, and for both 3.6 and 3.10 PETSc versions. 

For convenience, I summarized the information in a graph, also attached (png file).

Great! Can you draw lines for the different runs you did? My interpretation was that memory was increasing

as you did larger runs, and that you though that was coming from GAMG. That means the curves should

be pushed up for larger runs. Do you see that?

  Thanks,

    Matt 

As you can see, there are slight differences between the two versions but none is critical, in my opinion. Do you see anything suspicious in the outputs?

+ I can't find the default threshold value. Do you know where I can find it?

Thanks for the follow-up

Myriam

Le 03/05/19 à 14:06, Matthew Knepley a écrit :

</pre>

          <blockquote type="cite">

            <pre wrap="">On Tue, Mar 5, 2019 at 7:14 AM Myriam Peyrounette <a class="moz-txt-link-rfc2396E" href="mailto:myriam.peyrounette@idris.fr"><myriam.peyrounette@idris.fr></a> wrote:

Hi Matt,

I plotted the memory scalings using different threshold values. The two scalings are slightly translated (from -22 to -88 mB) but this gain is neglectable. The 3.6-scaling keeps being robust while the 3.10-scaling deteriorates.

Do you have any other suggestion?

Mark, what is the option she can give to output all the GAMG data?

Also, run using -ksp_view. GAMG will report all the sizes of its grids, so it should be easy to see

if the coarse grid sizes are increasing, and also what the effect of the threshold value is.

  Thanks,

     Matt 

Thanks

Myriam 

Le 03/02/19 à 02:27, Matthew Knepley a écrit :

</pre>

            <blockquote type="cite">

              <pre wrap="">On Fri, Mar 1, 2019 at 10:53 AM Myriam Peyrounette via petsc-users <a class="moz-txt-link-rfc2396E" href="mailto:petsc-users@mcs.anl.gov"><petsc-users@mcs.anl.gov></a> wrote:

Hi,

I used to run my code with PETSc 3.6. Since I upgraded the PETSc version

to 3.10, this code has a bad memory scaling.

To report this issue, I took the PETSc script ex42.c and slightly

modified it so that the KSP and PC configurations are the same as in my

code. In particular, I use a "personnalised" multi-grid method. The

modifications are indicated by the keyword "TopBridge" in the attached

scripts.

To plot the memory (weak) scaling, I ran four calculations for each

script with increasing problem sizes and computations cores:

1. 100,000 elts on 4 cores

2. 1 million elts on 40 cores

3. 10 millions elts on 400 cores

4. 100 millions elts on 4,000 cores

The resulting graph is also attached. The scaling using PETSc 3.10

clearly deteriorates for large cases, while the one using PETSc 3.6 is

robust.

After a few tests, I found that the scaling is mostly sensitive to the

use of the AMG method for the coarse grid (line 1780 in

main_ex42_petsc36.cc). In particular, the performance strongly

deteriorates when commenting lines 1777 to 1790 (in main_ex42_petsc36.cc).

Do you have any idea of what changed between version 3.6 and version

3.10 that may imply such degradation?

I believe the default values for PCGAMG changed between versions. It sounds like the coarsening rate

is not great enough, so that these grids are too large. This can be set using:

  <a class="moz-txt-link-freetext" href="https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCGAMGSetThreshold.html">https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCGAMGSetThreshold.html</a>

There is some explanation of this effect on that page. Let us know if setting this does not correct the situation.

  Thanks,

     Matt

Let me know if you need further information.

Best,

Myriam Peyrounette

-- 

Myriam Peyrounette

CNRS/IDRIS - HLST

--

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.

-- Norbert Wiener

<a class="moz-txt-link-freetext" href="https://www.cse.buffalo.edu/%7Eknepley/">https://www.cse.buffalo.edu/~knepley/</a>

</pre>

            </blockquote>

            <pre wrap="">-- 

Myriam Peyrounette

CNRS/IDRIS - HLST

--

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.

-- Norbert Wiener

<a class="moz-txt-link-freetext" href="https://www.cse.buffalo.edu/%7Eknepley/">https://www.cse.buffalo.edu/~knepley/</a>

</pre>

          </blockquote>

          <pre wrap="">-- 

Myriam Peyrounette

CNRS/IDRIS - HLST

--

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.

-- Norbert Wiener

<a class="moz-txt-link-freetext" href="https://www.cse.buffalo.edu/%7Eknepley/">https://www.cse.buffalo.edu/~knepley/</a>

</pre>

        </blockquote>

      </blockquote>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Myriam Peyrounette

CNRS/IDRIS - HLST

--

</pre>

  </body>

</html>