On Mon, Nov 5, 2012 at 9:06 AM, Karl Rupp <span dir="ltr"><<a href="mailto:rupp@mcs.anl.gov" target="_blank">rupp@mcs.anl.gov</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Lawrence,<div class="im"><br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

That's it for now, after some more refining I'll start with a careful<br>

migration of the code/concepts into PETSc. Comments are, of course,<br>

always welcome.<br>

</blockquote>

<br>

So we're working on FE assembly + solve on GPUs using fenics kernels<br>

(<a href="http://github.com/OP2/PyOP2" target="_blank">github.com/OP2/PyOP2</a>).  For the GPU solve, it would be nice if we could<br>

backdoor assembled matrices straight on to the GPU.  That is, create a Mat<br>

saying "this is the sparsity pattern" and then, rather than calling<br>

MatSetValues on the host, just pass a pointer to the device data.<br>

</blockquote>

<br></div>

Thanks for the input. My reference implementation supports such kind of backdooring, so there is no conceptional problem with that. What I don't know yet is 'The Right Way' of integrating this functionality into the existing PETSc interface routines. Anyhow, I see this as an essential feature, so it's on my roadmap already.<div class="im">

<br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

At the moment, we're doing a similar thing using CUSP, but are looking at<br>

doing multi-GPU assembly + solve and would like not to have to reinvent too<br>

many wheels, in particular, the MPI-parallel layer.  Additionally, we're<br>

already using PETSc for the CPU-side linear algebra so it would be nice to<br>

use the same interface everywhere.<br>

</blockquote>

<br></div>

Yes, that's what we are aiming for. The existing MPI-layer just works well irrespective of whether you're dealing with CPUs or GPUs on each rank.<div class="im"><br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I guess effectively we'd like something like MatCreateSeqAIJWithArrays and<br>

MatCreateMPIAIJWithSplitArrays but with the ability to pass device pointers<br>

rather than host pointers.  Is there any roadmap in PETSc for this kind of<br>

thing?  Would patches in this direction be welcome?<br>

</blockquote>

<br></div>

Type safety is a bit nasty. CUDA allows to deal with plain 'void *', while OpenCL expects cl_mem. This suggests to use something like<br>

  MatCreateSeqAIJWithCUDAArrays(<u></u>),<br>

  MatCreateSeqAIJWithOpenCLArray<u></u>s(),<br>

but as I said above, I haven't come to a decision on that yet.<br></blockquote><div><br></div><div>Let me be more specific. I would not support this. I think it is wrong.</div><div><br></div><div>You should create the Mat in the normal way and then pull out the backend</div>

<div>storage. This way we have one simple interface for creation and preallocation,</div><div>and eventually we make a nicer FEM interface to cover up the device pointer</div><div>extraction.</div><div><br></div><div>   Matt</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I'm not aware of any roadmap on the GPU part, but I want to integrate this rather sooner than later. Patches are of course welcome, either for the current branch, or later on based on the refurbished GPU extensions.<br>


<br>

Best regards,<br>

Karli<br>

<br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

</div>