[mpich-discuss] Why do predefined MPI_Ops function elementwise in MPI_Accumulate, but not in MPI-1 routines?

Wed Apr 25 09:55:54 CDT 2012

On 4/24/12 5:27 PM, Jed Brown wrote:
> It would be interesting to understand why this is and if there would be
> a way to get around that potential inefficiency. Is any system seriously
> going to implement *every* MPI predefined type and operation in
> hardware? If there are any exceptions, isn't the check to revert to a
> CPU-based implementation going to have to exist anyway?

Yes, one of the design goals behind MPI is that it should be 
implementable in hardware.

> It's obnoxious that MPI-1 could be used reliably with new types (e.g.
> __float128, user-defined complex, etc), but that newer features in MPI-2
> cannot be and that the standard is evolving to further cement this
> position that user-defined types and operations are second-class.

This is definitely not the intentional of the MPI Forum.  If there's 
something that we can do to improve MPI's interoperability with complex 
and user-defined types, we will do it.  MPI intentionally does not 
support functionality that can't be implemented efficiently -- this is 
what's behind the MPI-2 RMA semantics.

  ~Jim.