<br><div class="gmail_extra">On Mon, Nov 5, 2012 at 6:27 AM, Lawrence Mitchell <span dir="ltr"><<a href="mailto:lawrence.mitchell@ed.ac.uk" target="_blank">lawrence.mitchell@ed.ac.uk</a>></span> wrote:<br><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Karli, and others,<br>

<br>

On 05/11/2012 01:51, Karl Rupp wrote:<br>

<br>

...<br>

<div class="im"><br>

> That's it for now, after some more refining I'll start with a careful<br>

> migration of the code/concepts into PETSc. Comments are, of course,<br>

> always welcome.<br>

<br>

</div>So we're working on FE assembly + solve on GPUs using fenics kernels<br>

(<a href="http://github.com/OP2/PyOP2" target="_blank">github.com/OP2/PyOP2</a>).  For the GPU solve, it would be nice if we could<br>

backdoor assembled matrices straight on to the GPU.  That is, create a Mat<br>

saying "this is the sparsity pattern" and then, rather than calling<br>

MatSetValues on the host, just pass a pointer to the device data.<br></blockquote><div><br></div><div>Yes, I was doing the same thing. GPU assembly is not really any faster than CPU,</div><div>so I did not publish it, but not having to transfer the values back is very nice.</div>

<div><br></div><div>In Karl's scheme its easy, since you just go in and get the device pointer from the</div><div>handle. We can talk about a user level interface after you verify that this works and</div><div>is what you want.</div>

<div><br></div><div>    Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

At the moment, we're doing a similar thing using CUSP, but are looking at<br>

doing multi-GPU assembly + solve and would like not to have to reinvent too<br>

many wheels, in particular, the MPI-parallel layer.  Additionally, we're<br>

already using PETSc for the CPU-side linear algebra so it would be nice to<br>

use the same interface everywhere.<br>

<br>

I guess effectively we'd like something like MatCreateSeqAIJWithArrays and<br>

MatCreateMPIAIJWithSplitArrays but with the ability to pass device pointers<br>

rather than host pointers.  Is there any roadmap in PETSc for this kind of<br>

thing?  Would patches in this direction be welcome?<br>

<br>

Cheers,<br>

Lawrence<br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

The University of Edinburgh is a charitable body, registered in<br>

Scotland, with registration number SC005336.<br>

<br>

</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

</div>