<div dir="ltr">Hi Stephan,<div><br></div><div>Yes, MIS(A^T A) -> MIS(MIS(A)) change?</div><div><br></div><div>Yep, that is it.</div><div><br></div><div>This change was required because A^T A is super expensive. This change did not do much to my tests but this is complex.</div><div><br></div><div>I am on travel now, but I can get to this in a few days. You provided me with a lot of data and I can take a look, but I think we need to look at parameters,</div><div><br></div><div>Thanks,</div><div>Mark</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 9, 2023 at 10:08 AM Stephan Kramer <<a href="mailto:s.kramer@imperial.ac.uk">s.kramer@imperial.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear petsc devs<br>
<br>
We have noticed a performance regression using GAMG as the <br>
preconditioner to solve the velocity block in a Stokes equations saddle <br>
point system with variable viscosity solved on a 3D hexahedral mesh of a <br>
spherical shell using Q2-Q1 elements. This is comparing performance from <br>
the beginning of last year (petsc 3.16.4) and a more recent petsc master <br>
(from around May this year). This is the weak scaling analysis we <br>
published in <a href="https://doi.org/10.5194/gmd-15-5127-2022" rel="noreferrer" target="_blank">https://doi.org/10.5194/gmd-15-5127-2022</a> Previously the <br>
number of iterations for the velocity block (inner solve of the Schur <br>
complement) starts at 40 iterations <br>
(<a href="https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png" rel="noreferrer" target="_blank">https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png</a>) <br>
and only slowly going for larger problems (+more cores). Now the number <br>
of iterations now starts at 60 <br>
(<a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png</a>), <br>
same tolerances, again slowly going up with increasing size, with the <br>
cost per iteration also gone up (slightly) - resulting in an increased <br>
runtime of > 50%.<br>
<br>
The main change we can see is that the coarsening seems to have gotten a <br>
lot less aggressive at the first coarsening stage (finest->to <br>
one-but-finest) - presumably after the MIS(A^T A) -> MIS(MIS(A)) change? <br>
The performance issues might be similar to <br>
<a href="https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html" rel="noreferrer" target="_blank">https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html</a> ?<br>
<br>
As an example at "Level 7" (6,389,890 vertices, run on 1536 cpus) on the <br>
older petsc version we had:<br>
<br>
rows=126, cols=126, bs=6<br>
total: nonzeros=15876, allocated nonzeros=15876<br>
--<br>
rows=3072, cols=3072, bs=6<br>
total: nonzeros=3344688, allocated nonzeros=3344688<br>
--<br>
rows=91152, cols=91152, bs=6<br>
total: nonzeros=109729584, allocated nonzeros=109729584<br>
--<br>
rows=2655378, cols=2655378, bs=6<br>
total: nonzeros=1468980252, allocated nonzeros=1468980252<br>
--<br>
rows=152175366, cols=152175366, bs=3<br>
total: nonzeros=29047661586, allocated nonzeros=29047661586<br>
<br>
Whereas with the newer version we get:<br>
<br>
rows=420, cols=420, bs=6<br>
total: nonzeros=176400, allocated nonzeros=176400<br>
--<br>
rows=6462, cols=6462, bs=6<br>
total: nonzeros=10891908, allocated nonzeros=10891908<br>
--<br>
rows=91716, cols=91716, bs=6<br>
total: nonzeros=81687384, allocated nonzeros=81687384<br>
--<br>
rows=5419362, cols=5419362, bs=6<br>
total: nonzeros=3668190588, allocated nonzeros=3668190588<br>
--<br>
rows=152175366, cols=152175366, bs=3<br>
total: nonzeros=29047661586, allocated nonzeros=29047661586<br>
<br>
So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to <br>
2.6e6 DOFs. Note that we are providing the rigid body near nullspace, <br>
hence the bs=3 to bs=6.<br>
We have tried different values for the gamg_threshold but it doesn't <br>
really seem to significantly alter the coarsening amount in that first step.<br>
<br>
Do you have any suggestions for further things we should try/look at? <br>
Any feedback would be much appreciated<br>
<br>
Best wishes<br>
Stephan Kramer<br>
<br>
Full logs including log_view timings available from <br>
<a href="https://github.com/stephankramer/petsc-scaling/" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/</a><br>
<br>
In particular:<br>
<br>
<a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat</a><br>
<a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat</a><br>
<a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat</a><br>
<a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat</a><br>
<a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat</a><br>
<a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat</a> <br>
<br>
</blockquote></div>