<div class="gmail_quote">On Tue, Feb 21, 2012 at 15:37, Barry Smith <span dir="ltr">&lt;<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Why not just have MatAssemblyBegin_Nest() call the inner MatAssemblyBegin/End() together and stop the charade that there is any overlap of communication and computations etc anyway?</blockquote></div><div><br></div><div>(I pushed this.)</div>

<br><div>So there is a very real latency issue in matrix assembly that comes from the reduction to determine how many receives are necessary. Due to MPI limitations, that code (PetscGatherNumberOfMessages() and PetscGatherMessageLengths()) is synchronizing, but MPI-3 will offer non-blocking collectives that we could use for those operations. Now the two entrance points (MatAssemblyBegin() and MatAssemblyEnd()) are not sufficient to make progress on this task of assembly without also having an internal request system (where either a comm thread or callbacks from other library functions poked the progress along).</div>

<div><br></div><div>There are also signs that sometime soon it will be common to have a comm thread that manages packing, in which case communication could actually start happening concurrently with computation.</div><div>

<br></div>