[petsc-users] performance regression with GAMG

Stephan Kramer s.kramer at imperial.ac.uk
Wed Aug 9 12:06:23 CDT 2023


Dear petsc devs

We have noticed a performance regression using GAMG as the 
preconditioner to solve the velocity block in a Stokes equations saddle 
point system with variable viscosity solved on a 3D hexahedral mesh of a 
spherical shell using Q2-Q1 elements. This is comparing performance from 
the beginning of last year (petsc 3.16.4) and a more recent petsc master 
(from around May this year). This is the weak scaling analysis we 
published in https://doi.org/10.5194/gmd-15-5127-2022 Previously the 
number of iterations for the velocity block (inner solve of the Schur 
complement) starts at 40 iterations 
(https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png) 
and only slowly going for larger problems (+more cores). Now the number 
of iterations now starts at 60 
(https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png), 
same tolerances, again slowly going up with increasing size, with the 
cost per iteration also gone up (slightly) - resulting in an increased 
runtime of > 50%.

The main change we can see is that the coarsening seems to have gotten a 
lot less aggressive at the first coarsening stage (finest->to 
one-but-finest) - presumably after the MIS(A^T A) -> MIS(MIS(A)) change? 
The performance issues might be similar to 
https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html ?

As an example at "Level 7" (6,389,890 vertices, run on 1536 cpus) on the 
older petsc version we had:

           rows=126, cols=126, bs=6
           total: nonzeros=15876, allocated nonzeros=15876
--
           rows=3072, cols=3072, bs=6
           total: nonzeros=3344688, allocated nonzeros=3344688
--
           rows=91152, cols=91152, bs=6
           total: nonzeros=109729584, allocated nonzeros=109729584
--
           rows=2655378, cols=2655378, bs=6
           total: nonzeros=1468980252, allocated nonzeros=1468980252
--
           rows=152175366, cols=152175366, bs=3
           total: nonzeros=29047661586, allocated nonzeros=29047661586

Whereas with the newer version we get:

           rows=420, cols=420, bs=6
           total: nonzeros=176400, allocated nonzeros=176400
--
           rows=6462, cols=6462, bs=6
           total: nonzeros=10891908, allocated nonzeros=10891908
--
           rows=91716, cols=91716, bs=6
           total: nonzeros=81687384, allocated nonzeros=81687384
--
           rows=5419362, cols=5419362, bs=6
           total: nonzeros=3668190588, allocated nonzeros=3668190588
--
           rows=152175366, cols=152175366, bs=3
           total: nonzeros=29047661586, allocated nonzeros=29047661586

So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to 
2.6e6 DOFs. Note that we are providing the rigid body near nullspace, 
hence the bs=3 to bs=6.
We have tried different values for the gamg_threshold but it doesn't 
really seem to significantly alter the coarsening amount in that first step.

Do you have any suggestions for further things we should try/look at? 
Any feedback would be much appreciated

Best wishes
Stephan Kramer

Full logs including log_view timings available from 
https://github.com/stephankramer/petsc-scaling/

In particular:

https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat 



More information about the petsc-users mailing list