[petsc-dev] Current status: GPUs for PETSc

Matthew Knepley knepley at gmail.com
Mon Nov 5 08:10:44 CST 2012


On Mon, Nov 5, 2012 at 9:06 AM, Karl Rupp <rupp at mcs.anl.gov> wrote:

> Hi Lawrence,
>
>
>
>  That's it for now, after some more refining I'll start with a careful
>>> migration of the code/concepts into PETSc. Comments are, of course,
>>> always welcome.
>>>
>>
>> So we're working on FE assembly + solve on GPUs using fenics kernels
>> (github.com/OP2/PyOP2).  For the GPU solve, it would be nice if we could
>> backdoor assembled matrices straight on to the GPU.  That is, create a Mat
>> saying "this is the sparsity pattern" and then, rather than calling
>> MatSetValues on the host, just pass a pointer to the device data.
>>
>
> Thanks for the input. My reference implementation supports such kind of
> backdooring, so there is no conceptional problem with that. What I don't
> know yet is 'The Right Way' of integrating this functionality into the
> existing PETSc interface routines. Anyhow, I see this as an essential
> feature, so it's on my roadmap already.
>
>
>
>  At the moment, we're doing a similar thing using CUSP, but are looking at
>> doing multi-GPU assembly + solve and would like not to have to reinvent
>> too
>> many wheels, in particular, the MPI-parallel layer.  Additionally, we're
>> already using PETSc for the CPU-side linear algebra so it would be nice to
>> use the same interface everywhere.
>>
>
> Yes, that's what we are aiming for. The existing MPI-layer just works well
> irrespective of whether you're dealing with CPUs or GPUs on each rank.
>
>
>
>  I guess effectively we'd like something like MatCreateSeqAIJWithArrays and
>> MatCreateMPIAIJWithSplitArrays but with the ability to pass device
>> pointers
>> rather than host pointers.  Is there any roadmap in PETSc for this kind of
>> thing?  Would patches in this direction be welcome?
>>
>
> Type safety is a bit nasty. CUDA allows to deal with plain 'void *', while
> OpenCL expects cl_mem. This suggests to use something like
>   MatCreateSeqAIJWithCUDAArrays(**),
>   MatCreateSeqAIJWithOpenCLArray**s(),
> but as I said above, I haven't come to a decision on that yet.
>

Let me be more specific. I would not support this. I think it is wrong.

You should create the Mat in the normal way and then pull out the backend
storage. This way we have one simple interface for creation and
preallocation,
and eventually we make a nicer FEM interface to cover up the device pointer
extraction.

   Matt


> I'm not aware of any roadmap on the GPU part, but I want to integrate this
> rather sooner than later. Patches are of course welcome, either for the
> current branch, or later on based on the refurbished GPU extensions.
>
> Best regards,
> Karli
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20121105/dde254c4/attachment.html>


More information about the petsc-dev mailing list