[petsc-users] Strong scaling issue cg solver with HYPRE preconditioner (BoomerAMG preconditioning)

Raphael Egan raphaelegan at ucsb.edu
Mon May 6 18:41:39 CDT 2019


Dear Petsc developer(s),

I am assessing the strong scalability of our incompressible Navier-Stokes
solver on distributed Octree grids developed at the University of
California, Santa Barbara.

I intend to show satisfactory strong scaling behavior, even on very large
numbers of cores for computational problems that involve a mesh that is big
enough. My scaling test case involves 10 time steps of our solver, applied
to the three-dimensional flow past a sphere on an extremely fine (but
non-uniform) mesh made of about 270,000,000 computational cells. We use the
p4est library for grid management and Petsc solvers for all linear algebra
problems. I am working on Stampede 2 supercomputer's KNL nodes, using 64
cores per node.

The most elementary usage of the solver can be summarized as follows:
- "viscosity step": velocity components are decoupled from each other and
solved for successively, leading to three successive Poisson problems with
an added positive diagonal term;
- "projection step": a projection step is solved to enforce the velocity
field to be divergence-free.
The solution of the "projection step" updates the appropriate boundary
conditions for the "viscosity step" and the process may thus be repeated
until a desired convergence threshold is reached for the current time step.

Given the discretization that we use for the "viscosity step", the three
corresponding linear systems are slightly nonsymmetric but have a dominant
diagonal. We use BiCGStab with PCSOR as a preconditioner invariably for
those problems (only a few iterations are required for those problems in
this case). On the contrary, the linear system to be solved for the
"projection step" is symmetric and positive definite, so we use a conjugate
gradient solver. PCSOR or PCHYPRE can be chosen by the user as a
preconditioner for that task. If PCHYPRE is chosen,
"-pc_hypre_boomeramg_strong_threshold" is set to "0.5",
"-pc_hypre_boomeramg_coarsen_type" is set to "Falgout" and
"-pc_hypre_boomeramg_truncfactor" is set to "0.1". Tests on my local
machine (on a much smaller problem) revealed that PCHYPRE outperforms PCSOR
(both in terms of execution time and number of iterations) and I expected
the conclusion to hold for (much) bigger problems on Stampede 2, as well.

However that was wrong: PCSOR actually outperforms PCHYPRE quite
significantly when using large numbers of cores for my scaling test
problem. Worse than that, strong scaling of the projection step is actually
very poor when using PCHYPRE as preconditioner: the time spent on that part
of the problem grows with the number of cores. See the results here below:

NB: the scaling test that is considered here below consists of 2 time steps
of the solver starting from an initial state read from disk, of around
270,000,000 computational cells. Each time step involves two sub-iterations,*
i.e.*, every time step solves a first viscosity step (3 calls to KSPSolve
of type bcgs with zero initial guesses), then a first projection step (1
call to KSPSolve of type cg with zero initial guess) then a second
viscosity step (3 more calls to KSPSolve of type bcgs with nonzero initial
guess), and a second projection step (1 call to KSPSolve of type cg with
nonzero initial guess). This makes a total of 16 calls to KSPSolves, 12 of
them are for the viscosity step (not under investigation here) and 4 others
are for the projection steps

*Mean execution time for the projection step (only):*
Number of cores: 4096    -->  mean execution time for projection step using
PCSOR: 125.7 s   ||||||  mean exectution time for projection step using
PCHYPRE: 192.7 s
Number of cores: 8192    -->  mean execution time for projection step using
PCSOR: 81.58 s   ||||||  mean exectution time for projection step using
PCHYPRE: 281.8 s
Number of cores: 16384  -->  mean execution time for projection step using
PCSOR:   48.2 s   ||||||  mean exectution time for projection step using
PCHYPRE: 311.5 s

The tests were all run with "-ksp_monitor", "-ksp_view" and "-log_view" for
gathering more information about the processes (see files attached). While
it is clear that the number of iterations is dramatically reduced when
using PCHYPRE, it looks like a *very* long time is spent in PCSetUp and
PCApply when using PCHYPRE for the projection step. When using PCHYPRE, the
time spent in PCSetUp, PCApply and KSPSolve grows with the number of cores
that are used. In particular, the time spent in PCSetUp alone when using
PCHYPRE is larger than the total KSPSolve time when using PCSOR (although
thousands of iterations are required in that case)...

I am very surprised by these results and I don't quite understand them, I
used PCHYPRE with similar options and option values in other elliptic
solver contexts previously with very satisfactory results, even on large
problems. I can't figure out why we loose strong scaling in this case?

Would you have some advice about a better preconditioner to use and or a
better set of options and/or option values to set in order to recover
strong scaling when using PCHYPRE?

-- 
Raphael M. Egan

Department of Mechanical Engineering
Engineering II Building, Office 2317
University of California, Santa Barbara
Santa Barbara, CA 93106
Email: raphaelegan at ucsb.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190506/b8b7358f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_test_flow_past_sphere_Re_500_scaling_test_16384_cg_hypre
Type: application/octet-stream
Size: 43617 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190506/b8b7358f/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_test_flow_past_sphere_Re_500_scaling_test_8192_cg_hypre
Type: application/octet-stream
Size: 43440 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190506/b8b7358f/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_test_flow_past_sphere_Re_500_scaling_test_4096_cg_hypre
Type: application/octet-stream
Size: 43270 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190506/b8b7358f/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_test_flow_past_sphere_Re_500_scaling_test_4096_cg_sor
Type: application/octet-stream
Size: 428768 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190506/b8b7358f/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_test_flow_past_sphere_Re_500_scaling_test_8192_cg_sor
Type: application/octet-stream
Size: 432853 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190506/b8b7358f/attachment-0010.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_test_flow_past_sphere_Re_500_scaling_test_16384_cg_sor
Type: application/octet-stream
Size: 439527 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190506/b8b7358f/attachment-0011.obj>


More information about the petsc-users mailing list