<div dir="ltr"><div dir="ltr">Fantastic!<div><br></div><div>I fixed a memory free problem. You should be OK now.<br></div><div>I am pretty sure you are good but I would like to wait to get any feedback from you.</div><div>We should have a release at the end of the month and it would be nice to get this into it.</div><div><br></div><div>Thanks,</div><div>Mark</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 1, 2023 at 7:07 AM Stephan Kramer <<a href="mailto:s.kramer@imperial.ac.uk" target="_blank">s.kramer@imperial.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Mark<br>

<br>

Sorry took a while to report back. We have tried your branch but hit a <br>

few issues, some of which we're not entirely sure are related.<br>

<br>

First switching off minimum degree ordering, and then switching to the <br>

old version of aggressive coarsening, as you suggested, got us back to <br>

the coarsening behaviour that we had previously, but then we also <br>

observed an even further worsening of the iteration count: it had <br>

previously gone up by 50% already (with the newer main petsc), but now <br>

was more than double "old" petsc. Took us a while to realize this was <br>

due to the default smoother changing from Cheby+SOR to Cheby+Jacobi. <br>

Switching this also back to the old default we get back to very similar <br>

coarsening levels (see below for more details if it is of interest) and <br>

iteration counts.<br>

<br>

So that's all very good news. However, we were also starting seeing <br>

memory errors (double free or corruption) when we switched off the <br>

minimum degree ordering. Because this was at an earlier version of your <br>

branch we then rebuild, hoping this was just an earlier bug that had <br>

been fixed, but then we were having MPI-lockup issues. We have now <br>

figured out the MPI issues are completely unrelated - some combination <br>

with a newer mpi build and firedrake on our cluster which also occur <br>

using main branches of everything. So switching back to an older MPI <br>

build we are hoping to now test your most recent version of <br>

adams/gamg-add-old-coarsening with these options and see whether the <br>

memory errors are still there. Will let you know<br>

<br>

Best wishes<br>

Stephan Kramer<br>

<br>

Coarsening details with various options for Level 6 of the test case:<br>

<br>

In our original setup (using "old" petsc), we had:<br>

<br>

           rows=516, cols=516, bs=6<br>

           rows=12660, cols=12660, bs=6<br>

           rows=346974, cols=346974, bs=6<br>

           rows=19169670, cols=19169670, bs=3<br>

<br>

Then with the newer main petsc we had<br>

<br>

           rows=666, cols=666, bs=6<br>

           rows=7740, cols=7740, bs=6<br>

           rows=34902, cols=34902, bs=6<br>

           rows=736578, cols=736578, bs=6<br>

           rows=19169670, cols=19169670, bs=3<br>

<br>

Then on your branch with minimum_degree_ordering False:<br>

<br>

           rows=504, cols=504, bs=6<br>

           rows=2274, cols=2274, bs=6<br>

           rows=11010, cols=11010, bs=6<br>

           rows=35790, cols=35790, bs=6<br>

           rows=430686, cols=430686, bs=6<br>

           rows=19169670, cols=19169670, bs=3<br>

<br>

And with minimum_degree_ordering False and use_aggressive_square_graph True:<br>

<br>

           rows=498, cols=498, bs=6<br>

           rows=12672, cols=12672, bs=6<br>

           rows=346974, cols=346974, bs=6<br>

           rows=19169670, cols=19169670, bs=3<br>

<br>

So that is indeed pretty much back to what it was before<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

On 31/08/2023 23:40, Mark Adams wrote:<br>

> Hi Stephan,<br>

><br>

> This branch is settling down.  adams/gamg-add-old-coarsening<br>

> <<a href="https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening" rel="noreferrer" target="_blank">https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening</a>><br>

> I made the old, not minimum degree, ordering the default but kept the new<br>

> "aggressive" coarsening as the default, so I am hoping that just adding<br>

> "-pc_gamg_use_aggressive_square_graph true" to your regression tests will<br>

> get you back to where you were before.<br>

> Fingers crossed ... let me know if you have any success or not.<br>

><br>

> Thanks,<br>

> Mark<br>

><br>

><br>

> On Tue, Aug 15, 2023 at 1:45 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

><br>

>> Hi Stephan,<br>

>><br>

>> I have a branch that you can try: adams/gamg-add-old-coarsening<br>

>> <<a href="https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening" rel="noreferrer" target="_blank">https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening</a>><br>

>><br>

>> Things to test:<br>

>> * First, verify that nothing unintended changed by reproducing your bad<br>

>> results with this branch (the defaults are the same)<br>

>> * Try not using the minimum degree ordering that I suggested<br>

>> with: -pc_gamg_use_minimum_degree_ordering false<br>

>>    -- I am eager to see if that is the main problem.<br>

>> * Go back to what I think is the old method:<br>

>> -pc_gamg_use_minimum_degree_ordering<br>

>> false -pc_gamg_use_aggressive_square_graph true<br>

>><br>

>> When we get back to where you were, I would like to try to get modern<br>

>> stuff working.<br>

>> I did add a -pc_gamg_aggressive_mis_k <2><br>

>> You could to another step of MIS coarsening with -pc_gamg_aggressive_mis_k<br>

>> 3<br>

>><br>

>> Anyway, lots to look at but, alas, AMG does have a lot of parameters.<br>

>><br>

>> Thanks,<br>

>> Mark<br>

>><br>

>> On Mon, Aug 14, 2023 at 4:26 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

>><br>

>>><br>

>>> On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer <<a href="mailto:s.kramer@imperial.ac.uk" target="_blank">s.kramer@imperial.ac.uk</a>><br>

>>> wrote:<br>

>>><br>

>>>> Many thanks for looking into this, Mark<br>

>>>>> My 3D tests were not that different and I see you lowered the<br>

>>>> threshold.<br>

>>>>> Note, you can set the threshold to zero, but your test is running so<br>

>>>> much<br>

>>>>> differently than mine there is something else going on.<br>

>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot<br>

>>>> for<br>

>>>>> in 3D.<br>

>>>>><br>

>>>>> So it is not clear what the problem is.  Some questions:<br>

>>>>><br>

>>>>> * do you have a picture of this mesh to show me?<br>

>>>> It's just a standard hexahedral cubed sphere mesh with the refinement<br>

>>>> level giving the number of times each of the six sides have been<br>

>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16<br>

>>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x 16 =<br>

>>>> 98304  hexes. And everything doubles in all 3 dimensions (so 2^3) going<br>

>>>> to the next Level<br>

>>>><br>

>>> I see, and I assume these are pretty stretched elements.<br>

>>><br>

>>><br>

>>>>> * what do you mean by Q1-Q2 elements?<br>

>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity<br>

>>>> and (tri)linear for pressure<br>

>>>><br>

>>>> I guess you could argue we could/should just do good old geometric<br>

>>>> multigrid instead. More generally we do use this solver configuration a<br>

>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our<br>

>>>> adaptive mesh runs - would it be worth to see if we have the same<br>

>>>> performance issues with tetrahedral P2-P1?<br>

>>>><br>

>>> No, you have a clear reproducer, if not minimal.<br>

>>> The first coarsening is very different.<br>

>>><br>

>>> I am working on this and I see that I added a heuristic for thin bodies<br>

>>> where you order the vertices in greedy algorithms with minimum degree first.<br>

>>> This will tend to pick corners first, edges then faces, etc.<br>

>>> That may be the problem. I would like to understand it better (see below).<br>

>>><br>

>>><br>

>>><br>

>>>>> It would be nice to see if the new and old codes are similar without<br>

>>>>> aggressive coarsening.<br>

>>>>> This was the intended change of the major change in this time frame as<br>

>>>> you<br>

>>>>> noticed.<br>

>>>>> If these jobs are easy to run, could you check that the old and new<br>

>>>>> versions are similar with "-pc_gamg_square_graph  0 ",  ( and you only<br>

>>>> need<br>

>>>>> one time step).<br>

>>>>> All you need to do is check that the first coarse grid has about the<br>

>>>> same<br>

>>>>> number of equations (large).<br>

>>>> Unfortunately we're seeing some memory errors when we use this option,<br>

>>>> and I'm not entirely clear whether we're just running out of memory and<br>

>>>> need to put it on a special queue.<br>

>>>><br>

>>>> The run with square_graph 0 using new PETSc managed to get through one<br>

>>>> solve at level 5, and is giving the following mg levels:<br>

>>>><br>

>>>>           rows=174, cols=174, bs=6<br>

>>>>             total: nonzeros=30276, allocated nonzeros=30276<br>

>>>> --<br>

>>>>             rows=2106, cols=2106, bs=6<br>

>>>>             total: nonzeros=4238532, allocated nonzeros=4238532<br>

>>>> --<br>

>>>>             rows=21828, cols=21828, bs=6<br>

>>>>             total: nonzeros=62588232, allocated nonzeros=62588232<br>

>>>> --<br>

>>>>             rows=589824, cols=589824, bs=6<br>

>>>>             total: nonzeros=1082528928, allocated nonzeros=1082528928<br>

>>>> --<br>

>>>>             rows=2433222, cols=2433222, bs=3<br>

>>>>             total: nonzeros=456526098, allocated nonzeros=456526098<br>

>>>><br>

>>>> comparing with square_graph 100 with new PETSc<br>

>>>><br>

>>>>             rows=96, cols=96, bs=6<br>

>>>>             total: nonzeros=9216, allocated nonzeros=9216<br>

>>>> --<br>

>>>>             rows=1440, cols=1440, bs=6<br>

>>>>             total: nonzeros=647856, allocated nonzeros=647856<br>

>>>> --<br>

>>>>             rows=97242, cols=97242, bs=6<br>

>>>>             total: nonzeros=65656836, allocated nonzeros=65656836<br>

>>>> --<br>

>>>>             rows=2433222, cols=2433222, bs=3<br>

>>>>             total: nonzeros=456526098, allocated nonzeros=456526098<br>

>>>><br>

>>>> and old PETSc with square_graph 100<br>

>>>><br>

>>>>             rows=90, cols=90, bs=6<br>

>>>>             total: nonzeros=8100, allocated nonzeros=8100<br>

>>>> --<br>

>>>>             rows=1872, cols=1872, bs=6<br>

>>>>             total: nonzeros=1234080, allocated nonzeros=1234080<br>

>>>> --<br>

>>>>             rows=47652, cols=47652, bs=6<br>

>>>>             total: nonzeros=23343264, allocated nonzeros=23343264<br>

>>>> --<br>

>>>>             rows=2433222, cols=2433222, bs=3<br>

>>>>             total: nonzeros=456526098, allocated nonzeros=456526098<br>

>>>> --<br>

>>>><br>

>>>> Unfortunately old PETSc with square_graph 0 did not complete a single<br>

>>>> solve before giving the memory error<br>

>>>><br>

>>> OK, thanks for trying.<br>

>>><br>

>>> I am working on this and I will give you a branch to test, but if you can<br>

>>> rebuild PETSc here is a quick test that might fix your problem.<br>

>>> In src/ksp/pc/impls/gamg/agg.c you will see:<br>

>>><br>

>>>      PetscCall(PetscSortIntWithArray(nloc, degree, permute));<br>

>>><br>

>>> If you can comment this out in the new code and compare with the old,<br>

>>> that might fix the problem.<br>

>>><br>

>>> Thanks,<br>

>>> Mark<br>

>>><br>

>>><br>

>>>>> BTW, I am starting to think I should add the old method back as an<br>

>>>> option.<br>

>>>>> I did not think this change would cause large differences.<br>

>>>> Yes, I think that would be much appreciated. Let us know if we can do<br>

>>>> any testing<br>

>>>><br>

>>>> Best wishes<br>

>>>> Stephan<br>

>>>><br>

>>>><br>

>>>>> Thanks,<br>

>>>>> Mark<br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>>> Note that we are providing the rigid body near nullspace,<br>

>>>>>> hence the bs=3 to bs=6.<br>

>>>>>> We have tried different values for the gamg_threshold but it doesn't<br>

>>>>>> really seem to significantly alter the coarsening amount in that first<br>

>>>>>> step.<br>

>>>>>><br>

>>>>>> Do you have any suggestions for further things we should try/look at?<br>

>>>>>> Any feedback would be much appreciated<br>

>>>>>><br>

>>>>>> Best wishes<br>

>>>>>> Stephan Kramer<br>

>>>>>><br>

>>>>>> Full logs including log_view timings available from<br>

>>>>>> <a href="https://github.com/stephankramer/petsc-scaling/" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/</a><br>

>>>>>><br>

>>>>>> In particular:<br>

>>>>>><br>

>>>>>><br>

>>>>>><br>

>>>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat</a><br>

>>>>>><br>

>>>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat</a><br>

>>>>>><br>

>>>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat</a><br>

>>>>>><br>

>>>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat</a><br>

>>>>>><br>

>>>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat</a><br>

>>>>>><br>

>>>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat</a><br>

>>>>>><br>

>>>><br>

<br>

</blockquote></div></div>