On Sat, Oct 6, 2012 at 5:26 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div id=":zo">So, could we use a single kernel launcher for multi-core, CUDA, OpenCL based on this principle? Then VecCUDAGetArray() type things would keep track of parts of Vecs based on IS instead of all entries in the Vec.  Similarly there would be a VecMultiCoreGetArray(). Whenever possible the VecXXXGetArray() would not require copies.    As part of this model I'd also like to separate the "moving needed data" part of the kernel from the "computation on the data" so that everything doesn't block when data is being moved around.<br>

</div></blockquote><div><br></div><div>Hmm, "kernel" code is different in each case. I think it's premature to try to share the launcher now, but perhaps it could be restructured to support that case.</div><div>

<br></div><div>Note that sometimes (even now) we want to ensure that a memory copy is up to date before launching a kernel. In the threads case, we could make a collective VecXXGetArray(), but on the device, we have to do the transfer before landing in device code.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":zo">

<br>

   Ok, how about moving this same model up to the MPI level? We already do this with IS converted to VecScatter (for performance) for updating ghost points (for matrix-vector products, for PDE ghost points etc) (note we can hide the VecScatter inside the IS and have it created as needed).</div>

</blockquote></div><br><div>VecGetSubVector() sort of does this "hiding the VecScatter". In the general MPI world, we need a "start" for this sort of subvector access to overlap comm with computation.</div>

<div><br></div><div>I think a huge number of operations can be phrased as asynchronous access to subvectors and submatrices, but that's a separate discussion.</div>