Dave, it looks like you implemented the non-blocking collectives for MPI-3. How reasonable do you think it would be to expose enough hooks to be able to write a user-defined non-blocking collective that could make useful progress?<div>
<br></div><div>This comes up pretty frequently where there is a high-level operation with collective semantics that internally needs to perform multiple dependent communications. We might still expose a non-blocking interface to the user, but the performance benefit is limited because the multiple rounds either take place up-front or at the end.</div>
<div><br></div><div>Generalized requests aren't (currently) a good solution because (from what I can tell), they only make progress when _that request_ is polled. In practice, you want to poke those requests from other library calls (or, eventually/on some systems, by a progress thread), just like MPI native operations can make progress without explicitly being polled.</div>