<div class="gmail_quote">On Tue, Apr 24, 2012 at 16:40, Jim Dinan <span dir="ltr">&lt;<a href="mailto:dinan@mcs.anl.gov">dinan@mcs.anl.gov</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I think I&#39;m getting MPI 2.2 and 3.0 semantics mixed up.  MPI 2.2 only allows concurrent or same-epoch accumulates that use the same operation and have the same basic datatype.  So, this is fine in 2.2.  MPI 3.0 will relax the same operation restriction, which could make it challenging to have an efficient hardware implementation that maintains atomicity with an agent running on the CPU.</blockquote>

</div><br><div>It would be interesting to understand why this is and if there would be a way to get around that potential inefficiency. Is any system seriously going to implement *every* MPI predefined type and operation in hardware? If there are any exceptions, isn&#39;t the check to revert to a CPU-based implementation going to have to exist anyway?</div>

<div><br></div><div>It&#39;s obnoxious that MPI-1 could be used reliably with new types (e.g. __float128, user-defined complex, etc), but that newer features in MPI-2 cannot be and that the standard is evolving to further cement this position that user-defined types and operations are second-class.</div>