[petsc-dev] PETSc programming model for multi-core systems
jed at 59A2.org
Fri Nov 12 08:59:22 CST 2010
On Fri, Nov 12, 2010 at 15:31, Rodrigo R. Paz <rodrigop at intec.unl.edu.ar>wrote:
> Of course, we have linear scaling in computing RHS and Matrix contrib.
> Also, the problem size when ranging nodes and cores was fixed (2.5M).
Thank you. In a chat with Lisandro, I hear that this is a 2D problem. So
the interesting issue is where does OpenMP offer a real advantage over flat
MPI? It's not on the bandwidth front because both implementations face the
same barriers, so we have to look further. The place where there is a deep
difference is when the subdomain surface area is large compared to the
interior. For a scalar 3D problem with a stencil width of 1, the surface
becomes as big as the interior when you get subdomains smaller than 10^3.
It is much larger if you use quadratic elements (or otherwise have a larger
stencil). This is not an unreasonably small problem when strong scaling,
especially with a multi-component problem with nasty local structure (e.g.
FETI-DP methods for 3D elasticity, which involve a direct subdomain solve).
This tradeoff is almost invisible in 2D because, as you shrink the subdomain
size, you run out of any significant work to do on subdomains before the
interface becomes large compared to the interior. Thus latency of
reductions and scatters is the primary constraint, and that will be limited
more by the network (unless you are only running on a single node).
A partial counter-point is that MatSolve with OpenMP is unlikely to be near
the throughput of MPI-based MatSolve because the high-concurrency paths are
not going to provide very good memory locality, and may cause horrible
degradation due to cache-coherency issues. On the other hand, it is a very
different algorithm from the (block Jacobi or Schwarz) decomposed MPI
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the petsc-dev