[petsc-dev] OpenMPI is a menace

Jed Brown jedbrown at mcs.anl.gov
Mon Nov 5 00:04:40 CST 2012


On Sun, Nov 4, 2012 at 11:21 PM, Bryce Lelbach <blelbach at cct.lsu.edu> wrote:

> HPX is drastically different from MPI. Comparison table:
>
> HPX: Intra-node (threading) and inter-node (distributed); provides
> extermely fine grained threading (millions of short-lived threads per node)
> MPI: Only does distributed.
>

Uh, MPI does intra-node. In fact, the main reason for explicit threads
within a node is specifically to enhance sharing, mostly cache. There are
viable solutions to network device contention, including explicitly via
subcomms and implicitly via lighter weight MPI ranks (e.g., AMPI).


>
> HPX: Sends work to data.
> MPI: Sends data to work.
>

Uh, you run whatever task you want on the data at hand. You _can_ send
data, but that is in no way dictated by the MPI model.


>
> HPX: Provides an active global address space to abstract local memory
> boundaries across nodes.
> MPI: Forces user code to explicitly perform remote communication.
>

This is why we have libraries. The utility of a global address space is
mostly a philosophical debate. That remote data must be copied to local
storage if it will be accessed frequently is not controversial.


>
> HPX: Hides latencies by overlapping them with out computations.
> MPI: Only option to deal with latencies is latency avoidance.
>
> HPX: Utilizes local synchronization and zealously avoids explicit global
> barriers, allowing computation to proceed as far as possible without
> communicating/synchronizing.
> MPI: Strongly emphasizes global barriers.
>

Bullshit. An explicit barrier (like MPI_Barrier) is almost never necessary
or desirable. Indeed, it implies side-channel synchronization, such as via
the file system. Reductions like MPI_Allreduce() represent a data
dependence with no opportunity for further local work in the majority of
existing algorithms, such as Krylov methods and convergence tests. We have
a number of new algorithms, such as pipelined GMRES, for which non-blocking
collectives (MPI_Iallreduce, MPI_Ibarrier) are important.


>
> HPX: Supports the transmission of POD data, polymorphic types, functions
> and higher order functions (e.g. functions with bound arguments, etc).
> MPI: Only does POD data.
>

MPI_Datatypes are much more general.


>
> HPX: Diverse set of runtime services (builtin, intrinsic instrumentation,
> error handling facilities, logging, runtime configuration, loadable
> modules).
> MPI: None of the above.
>
> HPX: Supports dynamic, heuristic load balancing (both at the intra-node
> and inter-node level).
> MPI: Limited builtin support for static load balancing.
>
> HPX is a general-purpose C++11 runtime system for parallel and distributed
> computing
> of any scale (from quad-core tablets to exascale systems). It is freely
> available
> under a BSD-style license, and developed by a growing community of
> international
> collobators. It is an integral part of US DoE's exascale software stack,
> and is
> supported heavily by the US NSF.
>
> stellar.cct.lsu.edu for benchmarks, papers and links to the github
> repository.
>

I wish you the best in this project, but frankly, I'm unimpressed by the
performance obtained in your papers and it appears that this project has
overlooked many of the same "library" features that GASNet and other PGAS
systems have missed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20121105/2508cd58/attachment.html>


More information about the petsc-dev mailing list