[petsc-users] performance regression with GAMG

Mark Adams mfadams at lbl.gov
Tue Aug 15 12:45:32 CDT 2023


Hi Stephan,

I have a branch that you can try: adams/gamg-add-old-coarsening
<https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening>

Things to test:
* First, verify that nothing unintended changed by reproducing your bad
results with this branch (the defaults are the same)
* Try not using the minimum degree ordering that I suggested
with: -pc_gamg_use_minimum_degree_ordering false
  -- I am eager to see if that is the main problem.
* Go back to what I think is the old method:
-pc_gamg_use_minimum_degree_ordering
false -pc_gamg_use_aggressive_square_graph true

When we get back to where you were, I would like to try to get modern stuff
working.
I did add a -pc_gamg_aggressive_mis_k <2>
You could to another step of MIS coarsening with -pc_gamg_aggressive_mis_k 3

Anyway, lots to look at but, alas, AMG does have a lot of parameters.

Thanks,
Mark

On Mon, Aug 14, 2023 at 4:26 PM Mark Adams <mfadams at lbl.gov> wrote:

>
>
> On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer <s.kramer at imperial.ac.uk>
> wrote:
>
>> Many thanks for looking into this, Mark
>> > My 3D tests were not that different and I see you lowered the threshold.
>> > Note, you can set the threshold to zero, but your test is running so
>> much
>> > differently than mine there is something else going on.
>> > Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot for
>> > in 3D.
>> >
>> > So it is not clear what the problem is.  Some questions:
>> >
>> > * do you have a picture of this mesh to show me?
>>
>> It's just a standard hexahedral cubed sphere mesh with the refinement
>> level giving the number of times each of the six sides have been
>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16
>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x 16 =
>> 98304  hexes. And everything doubles in all 3 dimensions (so 2^3) going
>> to the next Level
>>
>
> I see, and I assume these are pretty stretched elements.
>
>
>>
>> > * what do you mean by Q1-Q2 elements?
>>
>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity
>> and (tri)linear for pressure
>>
>> I guess you could argue we could/should just do good old geometric
>> multigrid instead. More generally we do use this solver configuration a
>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our
>> adaptive mesh runs - would it be worth to see if we have the same
>> performance issues with tetrahedral P2-P1?
>>
>
> No, you have a clear reproducer, if not minimal.
> The first coarsening is very different.
>
> I am working on this and I see that I added a heuristic for thin bodies
> where you order the vertices in greedy algorithms with minimum degree first.
> This will tend to pick corners first, edges then faces, etc.
> That may be the problem. I would like to understand it better (see below).
>
>
>
>> >
>> > It would be nice to see if the new and old codes are similar without
>> > aggressive coarsening.
>> > This was the intended change of the major change in this time frame as
>> you
>> > noticed.
>> > If these jobs are easy to run, could you check that the old and new
>> > versions are similar with "-pc_gamg_square_graph  0 ",  ( and you only
>> need
>> > one time step).
>> > All you need to do is check that the first coarse grid has about the
>> same
>> > number of equations (large).
>> Unfortunately we're seeing some memory errors when we use this option,
>> and I'm not entirely clear whether we're just running out of memory and
>> need to put it on a special queue.
>>
>> The run with square_graph 0 using new PETSc managed to get through one
>> solve at level 5, and is giving the following mg levels:
>>
>>          rows=174, cols=174, bs=6
>>            total: nonzeros=30276, allocated nonzeros=30276
>> --
>>            rows=2106, cols=2106, bs=6
>>            total: nonzeros=4238532, allocated nonzeros=4238532
>> --
>>            rows=21828, cols=21828, bs=6
>>            total: nonzeros=62588232, allocated nonzeros=62588232
>> --
>>            rows=589824, cols=589824, bs=6
>>            total: nonzeros=1082528928, allocated nonzeros=1082528928
>> --
>>            rows=2433222, cols=2433222, bs=3
>>            total: nonzeros=456526098, allocated nonzeros=456526098
>>
>> comparing with square_graph 100 with new PETSc
>>
>>            rows=96, cols=96, bs=6
>>            total: nonzeros=9216, allocated nonzeros=9216
>> --
>>            rows=1440, cols=1440, bs=6
>>            total: nonzeros=647856, allocated nonzeros=647856
>> --
>>            rows=97242, cols=97242, bs=6
>>            total: nonzeros=65656836, allocated nonzeros=65656836
>> --
>>            rows=2433222, cols=2433222, bs=3
>>            total: nonzeros=456526098, allocated nonzeros=456526098
>>
>> and old PETSc with square_graph 100
>>
>>            rows=90, cols=90, bs=6
>>            total: nonzeros=8100, allocated nonzeros=8100
>> --
>>            rows=1872, cols=1872, bs=6
>>            total: nonzeros=1234080, allocated nonzeros=1234080
>> --
>>            rows=47652, cols=47652, bs=6
>>            total: nonzeros=23343264, allocated nonzeros=23343264
>> --
>>            rows=2433222, cols=2433222, bs=3
>>            total: nonzeros=456526098, allocated nonzeros=456526098
>> --
>>
>> Unfortunately old PETSc with square_graph 0 did not complete a single
>> solve before giving the memory error
>>
>
> OK, thanks for trying.
>
> I am working on this and I will give you a branch to test, but if you can
> rebuild PETSc here is a quick test that might fix your problem.
> In src/ksp/pc/impls/gamg/agg.c you will see:
>
>     PetscCall(PetscSortIntWithArray(nloc, degree, permute));
>
> If you can comment this out in the new code and compare with the old, that
> might fix the problem.
>
> Thanks,
> Mark
>
>
>>
>> >
>> > BTW, I am starting to think I should add the old method back as an
>> option.
>> > I did not think this change would cause large differences.
>>
>> Yes, I think that would be much appreciated. Let us know if we can do
>> any testing
>>
>> Best wishes
>> Stephan
>>
>>
>> >
>> > Thanks,
>> > Mark
>> >
>> >
>> >
>> >
>> >> Note that we are providing the rigid body near nullspace,
>> >> hence the bs=3 to bs=6.
>> >> We have tried different values for the gamg_threshold but it doesn't
>> >> really seem to significantly alter the coarsening amount in that first
>> >> step.
>> >>
>> >> Do you have any suggestions for further things we should try/look at?
>> >> Any feedback would be much appreciated
>> >>
>> >> Best wishes
>> >> Stephan Kramer
>> >>
>> >> Full logs including log_view timings available from
>> >> https://github.com/stephankramer/petsc-scaling/
>> >>
>> >> In particular:
>> >>
>> >>
>> >>
>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat
>> >>
>> >>
>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat
>> >>
>> >>
>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat
>> >>
>> >>
>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat
>> >>
>> >>
>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat
>> >>
>> >>
>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat
>> >>
>> >>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230815/6f0da36f/attachment.html>


More information about the petsc-users mailing list