<div dir="ltr">Hi Stephan,<div><br></div><div>Yes, MIS(A^T A) -> MIS(MIS(A)) change?</div><div><br></div><div>Yep, that is it.</div><div><br></div><div>This change was required because A^T A is super expensive. This change did not do much to my tests but this is complex.</div><div><br></div><div>I am on travel now, but I can get to this in a few days. You provided me with a lot of data and I can take a look, but I think we need to look at parameters,</div><div><br></div><div>Thanks,</div><div>Mark</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 9, 2023 at 10:08 AM Stephan Kramer <<a href="mailto:s.kramer@imperial.ac.uk">s.kramer@imperial.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear petsc devs<br>

<br>

We have noticed a performance regression using GAMG as the <br>

preconditioner to solve the velocity block in a Stokes equations saddle <br>

point system with variable viscosity solved on a 3D hexahedral mesh of a <br>

spherical shell using Q2-Q1 elements. This is comparing performance from <br>

the beginning of last year (petsc 3.16.4) and a more recent petsc master <br>

(from around May this year). This is the weak scaling analysis we <br>

published in <a href="https://doi.org/10.5194/gmd-15-5127-2022" rel="noreferrer" target="_blank">https://doi.org/10.5194/gmd-15-5127-2022</a> Previously the <br>

number of iterations for the velocity block (inner solve of the Schur <br>

complement) starts at 40 iterations <br>

(<a href="https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png" rel="noreferrer" target="_blank">https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png</a>) <br>

and only slowly going for larger problems (+more cores). Now the number <br>

of iterations now starts at 60 <br>

(<a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png</a>), <br>

same tolerances, again slowly going up with increasing size, with the <br>

cost per iteration also gone up (slightly) - resulting in an increased <br>

runtime of > 50%.<br>

<br>

The main change we can see is that the coarsening seems to have gotten a <br>

lot less aggressive at the first coarsening stage (finest->to <br>

one-but-finest) - presumably after the MIS(A^T A) -> MIS(MIS(A)) change? <br>

The performance issues might be similar to <br>

<a href="https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html" rel="noreferrer" target="_blank">https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html</a> ?<br>

<br>

As an example at "Level 7" (6,389,890 vertices, run on 1536 cpus) on the <br>

older petsc version we had:<br>

<br>

           rows=126, cols=126, bs=6<br>

           total: nonzeros=15876, allocated nonzeros=15876<br>

--<br>

           rows=3072, cols=3072, bs=6<br>

           total: nonzeros=3344688, allocated nonzeros=3344688<br>

--<br>

           rows=91152, cols=91152, bs=6<br>

           total: nonzeros=109729584, allocated nonzeros=109729584<br>

--<br>

           rows=2655378, cols=2655378, bs=6<br>

           total: nonzeros=1468980252, allocated nonzeros=1468980252<br>

--<br>

           rows=152175366, cols=152175366, bs=3<br>

           total: nonzeros=29047661586, allocated nonzeros=29047661586<br>

<br>

Whereas with the newer version we get:<br>

<br>

           rows=420, cols=420, bs=6<br>

           total: nonzeros=176400, allocated nonzeros=176400<br>

--<br>

           rows=6462, cols=6462, bs=6<br>

           total: nonzeros=10891908, allocated nonzeros=10891908<br>

--<br>

           rows=91716, cols=91716, bs=6<br>

           total: nonzeros=81687384, allocated nonzeros=81687384<br>

--<br>

           rows=5419362, cols=5419362, bs=6<br>

           total: nonzeros=3668190588, allocated nonzeros=3668190588<br>

--<br>

           rows=152175366, cols=152175366, bs=3<br>

           total: nonzeros=29047661586, allocated nonzeros=29047661586<br>

<br>

So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to <br>

2.6e6 DOFs. Note that we are providing the rigid body near nullspace, <br>

hence the bs=3 to bs=6.<br>

We have tried different values for the gamg_threshold but it doesn't <br>

really seem to significantly alter the coarsening amount in that first step.<br>

<br>

Do you have any suggestions for further things we should try/look at? <br>

Any feedback would be much appreciated<br>

<br>

Best wishes<br>

Stephan Kramer<br>

<br>

Full logs including log_view timings available from <br>

<a href="https://github.com/stephankramer/petsc-scaling/" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/</a><br>

<br>

In particular:<br>

<br>

<a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat</a><br>

<a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat</a><br>

<a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat</a><br>

<a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat</a><br>

<a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat</a><br>

<a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat</a> <br>

<br>

</blockquote></div>