<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer <<a href="mailto:s.kramer@imperial.ac.uk">s.kramer@imperial.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Many thanks for looking into this, Mark<br>

> My 3D tests were not that different and I see you lowered the threshold.<br>

> Note, you can set the threshold to zero, but your test is running so much<br>

> differently than mine there is something else going on.<br>

> Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot for<br>

> in 3D.<br>

><br>

> So it is not clear what the problem is.  Some questions:<br>

><br>

> * do you have a picture of this mesh to show me?<br>

<br>

It's just a standard hexahedral cubed sphere mesh with the refinement <br>

level giving the number of times each of the six sides have been <br>

subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16 <br>

layers. So the total number of elements at Level_5 is 6 x 32 x 32 x 16 = <br>

98304  hexes. And everything doubles in all 3 dimensions (so 2^3) going <br>

to the next Level<br></blockquote><div><br></div><div>I see, and I assume these are pretty stretched elements.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> * what do you mean by Q1-Q2 elements?<br>

<br>

Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity <br>

and (tri)linear for pressure<br>

<br>

I guess you could argue we could/should just do good old geometric <br>

multigrid instead. More generally we do use this solver configuration a <br>

lot for tetrahedral Taylor Hood (P2-P1) in particular also for our <br>

adaptive mesh runs - would it be worth to see if we have the same <br>

performance issues with tetrahedral P2-P1?<br></blockquote><div><br></div><div>No, you have a clear reproducer, if not minimal.</div><div>The first coarsening is very different.</div><div><br></div><div>I am working on this and I see that I added a heuristic for thin bodies where you order the vertices in greedy algorithms with minimum degree first.</div><div>This will tend to pick corners first, edges then faces, etc.</div><div>That may be the problem. I would like to understand it better (see below).</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

><br>

> It would be nice to see if the new and old codes are similar without<br>

> aggressive coarsening.<br>

> This was the intended change of the major change in this time frame as you<br>

> noticed.<br>

> If these jobs are easy to run, could you check that the old and new<br>

> versions are similar with "-pc_gamg_square_graph  0 ",  ( and you only need<br>

> one time step).<br>

> All you need to do is check that the first coarse grid has about the same<br>

> number of equations (large).<br>

Unfortunately we're seeing some memory errors when we use this option, <br>

and I'm not entirely clear whether we're just running out of memory and <br>

need to put it on a special queue.<br>

<br>

The run with square_graph 0 using new PETSc managed to get through one <br>

solve at level 5, and is giving the following mg levels:<br>

<br>

         rows=174, cols=174, bs=6<br>

           total: nonzeros=30276, allocated nonzeros=30276<br>

--<br>

           rows=2106, cols=2106, bs=6<br>

           total: nonzeros=4238532, allocated nonzeros=4238532<br>

--<br>

           rows=21828, cols=21828, bs=6<br>

           total: nonzeros=62588232, allocated nonzeros=62588232<br>

--<br>

           rows=589824, cols=589824, bs=6<br>

           total: nonzeros=1082528928, allocated nonzeros=1082528928<br>

--<br>

           rows=2433222, cols=2433222, bs=3<br>

           total: nonzeros=456526098, allocated nonzeros=456526098<br>

<br>

comparing with square_graph 100 with new PETSc<br>

<br>

           rows=96, cols=96, bs=6<br>

           total: nonzeros=9216, allocated nonzeros=9216<br>

--<br>

           rows=1440, cols=1440, bs=6<br>

           total: nonzeros=647856, allocated nonzeros=647856<br>

--<br>

           rows=97242, cols=97242, bs=6<br>

           total: nonzeros=65656836, allocated nonzeros=65656836<br>

--<br>

           rows=2433222, cols=2433222, bs=3<br>

           total: nonzeros=456526098, allocated nonzeros=456526098<br>

<br>

and old PETSc with square_graph 100<br>

<br>

           rows=90, cols=90, bs=6<br>

           total: nonzeros=8100, allocated nonzeros=8100<br>

--<br>

           rows=1872, cols=1872, bs=6<br>

           total: nonzeros=1234080, allocated nonzeros=1234080<br>

--<br>

           rows=47652, cols=47652, bs=6<br>

           total: nonzeros=23343264, allocated nonzeros=23343264<br>

--<br>

           rows=2433222, cols=2433222, bs=3<br>

           total: nonzeros=456526098, allocated nonzeros=456526098<br>

--<br>

<br>

Unfortunately old PETSc with square_graph 0 did not complete a single <br>

solve before giving the memory error<br></blockquote><div><br></div><div>OK, thanks for trying.</div><div><br></div><div>I am working on this and I will give you a branch to test, but if you can rebuild PETSc here is a quick test that might fix your problem.</div><div>In src/ksp/pc/impls/gamg/agg.c you will see:<br></div><div><br></div><div>    PetscCall(PetscSortIntWithArray(nloc, degree, permute));<br></div><div><br></div><div>If you can comment this out in the new code and compare with the old, that might fix the problem.</div><div><br></div><div>Thanks,</div><div>Mark</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

><br>

> BTW, I am starting to think I should add the old method back as an option.<br>

> I did not think this change would cause large differences.<br>

<br>

Yes, I think that would be much appreciated. Let us know if we can do <br>

any testing<br>

<br>

Best wishes<br>

Stephan<br>

<br>

<br>

><br>

> Thanks,<br>

> Mark<br>

><br>

><br>

><br>

><br>

>> Note that we are providing the rigid body near nullspace,<br>

>> hence the bs=3 to bs=6.<br>

>> We have tried different values for the gamg_threshold but it doesn't<br>

>> really seem to significantly alter the coarsening amount in that first<br>

>> step.<br>

>><br>

>> Do you have any suggestions for further things we should try/look at?<br>

>> Any feedback would be much appreciated<br>

>><br>

>> Best wishes<br>

>> Stephan Kramer<br>

>><br>

>> Full logs including log_view timings available from<br>

>> <a href="https://github.com/stephankramer/petsc-scaling/" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/</a><br>

>><br>

>> In particular:<br>

>><br>

>><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat</a><br>

>><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat</a><br>

>><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat</a><br>

>><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat</a><br>

>><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat</a><br>

>><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat</a><br>

>><br>

>><br>

<br>

</blockquote></div></div>