[petsc-users] OPENMP in PETSc

Wed Jul 28 10:43:51 CDT 2010

On Wed, 28 Jul 2010 09:20:46 -0500, "Ravi Kannan" <rxk at cfdrc.com> wrote:
> Can anyone tell, whether PETSc uses (or has) OPENMP provision.

You are free to use OpenMP in user code (like function evaluation and
Jacobian assembly).  There is some basic support in PCOPENMP.

> In addition, how does it compare with parallel systems with multi-core
> architecture.  Based on some literature search, my understanding is
> that MPI has poor scalability in multi-core systems, due to the
> Ethernet switch.

Modern MPI implementations don't use Ethernet for self-sends, instead
they map shared memory around.  Even when they talk over TCP, it's on
the kernel loopback device and so it never goes to the Ethernet device,
but is more expensive to copy through the kernel.  They can also use
RDMA provided by the HCA on e.g. an InfiniBand divice to do the sends
and receives without a context switch.  I've heard some people observe
this being faster than mapping shared memory for certain problems, but
it's usually not.

NUMA is an important complication, the mapping of physical pages (which
your program doesn't directly contral) is crucial to performance.  This
happens automatically with MPI (through affinity settings), but needs
some careful tuning with OpenMP.  In particular, it is very easy to
fault pages on a different socket from where you later use them, causing
mysterious slowdowns by a factor at least as large as the number of
sockets.  For problems that admit domain-decomposition strategies, it's
not clear that OpenMP is generally faster than MPI, the reliable memory
performance that you get from MPI should not be underestimated.  Like
anything, it's problem and hardware dependent.

Jed