[petsc-users] performance regression with GAMG
Stephan Kramer
s.kramer at imperial.ac.uk
Wed Aug 9 12:06:23 CDT 2023
Dear petsc devs
We have noticed a performance regression using GAMG as the
preconditioner to solve the velocity block in a Stokes equations saddle
point system with variable viscosity solved on a 3D hexahedral mesh of a
spherical shell using Q2-Q1 elements. This is comparing performance from
the beginning of last year (petsc 3.16.4) and a more recent petsc master
(from around May this year). This is the weak scaling analysis we
published in https://doi.org/10.5194/gmd-15-5127-2022 Previously the
number of iterations for the velocity block (inner solve of the Schur
complement) starts at 40 iterations
(https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png)
and only slowly going for larger problems (+more cores). Now the number
of iterations now starts at 60
(https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png),
same tolerances, again slowly going up with increasing size, with the
cost per iteration also gone up (slightly) - resulting in an increased
runtime of > 50%.
The main change we can see is that the coarsening seems to have gotten a
lot less aggressive at the first coarsening stage (finest->to
one-but-finest) - presumably after the MIS(A^T A) -> MIS(MIS(A)) change?
The performance issues might be similar to
https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html ?
As an example at "Level 7" (6,389,890 vertices, run on 1536 cpus) on the
older petsc version we had:
rows=126, cols=126, bs=6
total: nonzeros=15876, allocated nonzeros=15876
--
rows=3072, cols=3072, bs=6
total: nonzeros=3344688, allocated nonzeros=3344688
--
rows=91152, cols=91152, bs=6
total: nonzeros=109729584, allocated nonzeros=109729584
--
rows=2655378, cols=2655378, bs=6
total: nonzeros=1468980252, allocated nonzeros=1468980252
--
rows=152175366, cols=152175366, bs=3
total: nonzeros=29047661586, allocated nonzeros=29047661586
Whereas with the newer version we get:
rows=420, cols=420, bs=6
total: nonzeros=176400, allocated nonzeros=176400
--
rows=6462, cols=6462, bs=6
total: nonzeros=10891908, allocated nonzeros=10891908
--
rows=91716, cols=91716, bs=6
total: nonzeros=81687384, allocated nonzeros=81687384
--
rows=5419362, cols=5419362, bs=6
total: nonzeros=3668190588, allocated nonzeros=3668190588
--
rows=152175366, cols=152175366, bs=3
total: nonzeros=29047661586, allocated nonzeros=29047661586
So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to
2.6e6 DOFs. Note that we are providing the rigid body near nullspace,
hence the bs=3 to bs=6.
We have tried different values for the gamg_threshold but it doesn't
really seem to significantly alter the coarsening amount in that first step.
Do you have any suggestions for further things we should try/look at?
Any feedback would be much appreciated
Best wishes
Stephan Kramer
Full logs including log_view timings available from
https://github.com/stephankramer/petsc-scaling/
In particular:
https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat
More information about the petsc-users
mailing list