[petsc-users] performance regression with GAMG

Mark Adams mfadams at lbl.gov
Fri Sep 1 10:58:39 CDT 2023


Fantastic!

I fixed a memory free problem. You should be OK now.
I am pretty sure you are good but I would like to wait to get any feedback
from you.
We should have a release at the end of the month and it would be nice to
get this into it.

Thanks,
Mark


On Fri, Sep 1, 2023 at 7:07 AM Stephan Kramer <s.kramer at imperial.ac.uk>
wrote:

> Hi Mark
>
> Sorry took a while to report back. We have tried your branch but hit a
> few issues, some of which we're not entirely sure are related.
>
> First switching off minimum degree ordering, and then switching to the
> old version of aggressive coarsening, as you suggested, got us back to
> the coarsening behaviour that we had previously, but then we also
> observed an even further worsening of the iteration count: it had
> previously gone up by 50% already (with the newer main petsc), but now
> was more than double "old" petsc. Took us a while to realize this was
> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi.
> Switching this also back to the old default we get back to very similar
> coarsening levels (see below for more details if it is of interest) and
> iteration counts.
>
> So that's all very good news. However, we were also starting seeing
> memory errors (double free or corruption) when we switched off the
> minimum degree ordering. Because this was at an earlier version of your
> branch we then rebuild, hoping this was just an earlier bug that had
> been fixed, but then we were having MPI-lockup issues. We have now
> figured out the MPI issues are completely unrelated - some combination
> with a newer mpi build and firedrake on our cluster which also occur
> using main branches of everything. So switching back to an older MPI
> build we are hoping to now test your most recent version of
> adams/gamg-add-old-coarsening with these options and see whether the
> memory errors are still there. Will let you know
>
> Best wishes
> Stephan Kramer
>
> Coarsening details with various options for Level 6 of the test case:
>
> In our original setup (using "old" petsc), we had:
>
>            rows=516, cols=516, bs=6
>            rows=12660, cols=12660, bs=6
>            rows=346974, cols=346974, bs=6
>            rows=19169670, cols=19169670, bs=3
>
> Then with the newer main petsc we had
>
>            rows=666, cols=666, bs=6
>            rows=7740, cols=7740, bs=6
>            rows=34902, cols=34902, bs=6
>            rows=736578, cols=736578, bs=6
>            rows=19169670, cols=19169670, bs=3
>
> Then on your branch with minimum_degree_ordering False:
>
>            rows=504, cols=504, bs=6
>            rows=2274, cols=2274, bs=6
>            rows=11010, cols=11010, bs=6
>            rows=35790, cols=35790, bs=6
>            rows=430686, cols=430686, bs=6
>            rows=19169670, cols=19169670, bs=3
>
> And with minimum_degree_ordering False and use_aggressive_square_graph
> True:
>
>            rows=498, cols=498, bs=6
>            rows=12672, cols=12672, bs=6
>            rows=346974, cols=346974, bs=6
>            rows=19169670, cols=19169670, bs=3
>
> So that is indeed pretty much back to what it was before
>
>
>
>
>
>
>
>
> On 31/08/2023 23:40, Mark Adams wrote:
> > Hi Stephan,
> >
> > This branch is settling down.  adams/gamg-add-old-coarsening
> > <https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening>
> > I made the old, not minimum degree, ordering the default but kept the new
> > "aggressive" coarsening as the default, so I am hoping that just adding
> > "-pc_gamg_use_aggressive_square_graph true" to your regression tests will
> > get you back to where you were before.
> > Fingers crossed ... let me know if you have any success or not.
> >
> > Thanks,
> > Mark
> >
> >
> > On Tue, Aug 15, 2023 at 1:45 PM Mark Adams <mfadams at lbl.gov> wrote:
> >
> >> Hi Stephan,
> >>
> >> I have a branch that you can try: adams/gamg-add-old-coarsening
> >> <https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening
> >
> >>
> >> Things to test:
> >> * First, verify that nothing unintended changed by reproducing your bad
> >> results with this branch (the defaults are the same)
> >> * Try not using the minimum degree ordering that I suggested
> >> with: -pc_gamg_use_minimum_degree_ordering false
> >>    -- I am eager to see if that is the main problem.
> >> * Go back to what I think is the old method:
> >> -pc_gamg_use_minimum_degree_ordering
> >> false -pc_gamg_use_aggressive_square_graph true
> >>
> >> When we get back to where you were, I would like to try to get modern
> >> stuff working.
> >> I did add a -pc_gamg_aggressive_mis_k <2>
> >> You could to another step of MIS coarsening with
> -pc_gamg_aggressive_mis_k
> >> 3
> >>
> >> Anyway, lots to look at but, alas, AMG does have a lot of parameters.
> >>
> >> Thanks,
> >> Mark
> >>
> >> On Mon, Aug 14, 2023 at 4:26 PM Mark Adams <mfadams at lbl.gov> wrote:
> >>
> >>>
> >>> On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer <
> s.kramer at imperial.ac.uk>
> >>> wrote:
> >>>
> >>>> Many thanks for looking into this, Mark
> >>>>> My 3D tests were not that different and I see you lowered the
> >>>> threshold.
> >>>>> Note, you can set the threshold to zero, but your test is running so
> >>>> much
> >>>>> differently than mine there is something else going on.
> >>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot
> >>>> for
> >>>>> in 3D.
> >>>>>
> >>>>> So it is not clear what the problem is.  Some questions:
> >>>>>
> >>>>> * do you have a picture of this mesh to show me?
> >>>> It's just a standard hexahedral cubed sphere mesh with the refinement
> >>>> level giving the number of times each of the six sides have been
> >>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16
> >>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x
> 16 =
> >>>> 98304  hexes. And everything doubles in all 3 dimensions (so 2^3)
> going
> >>>> to the next Level
> >>>>
> >>> I see, and I assume these are pretty stretched elements.
> >>>
> >>>
> >>>>> * what do you mean by Q1-Q2 elements?
> >>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity
> >>>> and (tri)linear for pressure
> >>>>
> >>>> I guess you could argue we could/should just do good old geometric
> >>>> multigrid instead. More generally we do use this solver configuration
> a
> >>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our
> >>>> adaptive mesh runs - would it be worth to see if we have the same
> >>>> performance issues with tetrahedral P2-P1?
> >>>>
> >>> No, you have a clear reproducer, if not minimal.
> >>> The first coarsening is very different.
> >>>
> >>> I am working on this and I see that I added a heuristic for thin bodies
> >>> where you order the vertices in greedy algorithms with minimum degree
> first.
> >>> This will tend to pick corners first, edges then faces, etc.
> >>> That may be the problem. I would like to understand it better (see
> below).
> >>>
> >>>
> >>>
> >>>>> It would be nice to see if the new and old codes are similar without
> >>>>> aggressive coarsening.
> >>>>> This was the intended change of the major change in this time frame
> as
> >>>> you
> >>>>> noticed.
> >>>>> If these jobs are easy to run, could you check that the old and new
> >>>>> versions are similar with "-pc_gamg_square_graph  0 ",  ( and you
> only
> >>>> need
> >>>>> one time step).
> >>>>> All you need to do is check that the first coarse grid has about the
> >>>> same
> >>>>> number of equations (large).
> >>>> Unfortunately we're seeing some memory errors when we use this option,
> >>>> and I'm not entirely clear whether we're just running out of memory
> and
> >>>> need to put it on a special queue.
> >>>>
> >>>> The run with square_graph 0 using new PETSc managed to get through one
> >>>> solve at level 5, and is giving the following mg levels:
> >>>>
> >>>>           rows=174, cols=174, bs=6
> >>>>             total: nonzeros=30276, allocated nonzeros=30276
> >>>> --
> >>>>             rows=2106, cols=2106, bs=6
> >>>>             total: nonzeros=4238532, allocated nonzeros=4238532
> >>>> --
> >>>>             rows=21828, cols=21828, bs=6
> >>>>             total: nonzeros=62588232, allocated nonzeros=62588232
> >>>> --
> >>>>             rows=589824, cols=589824, bs=6
> >>>>             total: nonzeros=1082528928, allocated nonzeros=1082528928
> >>>> --
> >>>>             rows=2433222, cols=2433222, bs=3
> >>>>             total: nonzeros=456526098, allocated nonzeros=456526098
> >>>>
> >>>> comparing with square_graph 100 with new PETSc
> >>>>
> >>>>             rows=96, cols=96, bs=6
> >>>>             total: nonzeros=9216, allocated nonzeros=9216
> >>>> --
> >>>>             rows=1440, cols=1440, bs=6
> >>>>             total: nonzeros=647856, allocated nonzeros=647856
> >>>> --
> >>>>             rows=97242, cols=97242, bs=6
> >>>>             total: nonzeros=65656836, allocated nonzeros=65656836
> >>>> --
> >>>>             rows=2433222, cols=2433222, bs=3
> >>>>             total: nonzeros=456526098, allocated nonzeros=456526098
> >>>>
> >>>> and old PETSc with square_graph 100
> >>>>
> >>>>             rows=90, cols=90, bs=6
> >>>>             total: nonzeros=8100, allocated nonzeros=8100
> >>>> --
> >>>>             rows=1872, cols=1872, bs=6
> >>>>             total: nonzeros=1234080, allocated nonzeros=1234080
> >>>> --
> >>>>             rows=47652, cols=47652, bs=6
> >>>>             total: nonzeros=23343264, allocated nonzeros=23343264
> >>>> --
> >>>>             rows=2433222, cols=2433222, bs=3
> >>>>             total: nonzeros=456526098, allocated nonzeros=456526098
> >>>> --
> >>>>
> >>>> Unfortunately old PETSc with square_graph 0 did not complete a single
> >>>> solve before giving the memory error
> >>>>
> >>> OK, thanks for trying.
> >>>
> >>> I am working on this and I will give you a branch to test, but if you
> can
> >>> rebuild PETSc here is a quick test that might fix your problem.
> >>> In src/ksp/pc/impls/gamg/agg.c you will see:
> >>>
> >>>      PetscCall(PetscSortIntWithArray(nloc, degree, permute));
> >>>
> >>> If you can comment this out in the new code and compare with the old,
> >>> that might fix the problem.
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>>
> >>>>> BTW, I am starting to think I should add the old method back as an
> >>>> option.
> >>>>> I did not think this change would cause large differences.
> >>>> Yes, I think that would be much appreciated. Let us know if we can do
> >>>> any testing
> >>>>
> >>>> Best wishes
> >>>> Stephan
> >>>>
> >>>>
> >>>>> Thanks,
> >>>>> Mark
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Note that we are providing the rigid body near nullspace,
> >>>>>> hence the bs=3 to bs=6.
> >>>>>> We have tried different values for the gamg_threshold but it doesn't
> >>>>>> really seem to significantly alter the coarsening amount in that
> first
> >>>>>> step.
> >>>>>>
> >>>>>> Do you have any suggestions for further things we should try/look
> at?
> >>>>>> Any feedback would be much appreciated
> >>>>>>
> >>>>>> Best wishes
> >>>>>> Stephan Kramer
> >>>>>>
> >>>>>> Full logs including log_view timings available from
> >>>>>> https://github.com/stephankramer/petsc-scaling/
> >>>>>>
> >>>>>> In particular:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat
> >>>>>>
> >>>>
> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat
> >>>>>>
> >>>>
> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat
> >>>>>>
> >>>>
> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat
> >>>>>>
> >>>>
> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat
> >>>>>>
> >>>>
> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat
> >>>>>>
> >>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230901/22cb1507/attachment.html>


More information about the petsc-users mailing list