[petsc-dev] Current status: GPUs for PETSc

Mon Nov 5 08:06:04 CST 2012

Hi Lawrence,

>> That's it for now, after some more refining I'll start with a careful
>> migration of the code/concepts into PETSc. Comments are, of course,
>> always welcome.
>
> So we're working on FE assembly + solve on GPUs using fenics kernels
> (github.com/OP2/PyOP2).  For the GPU solve, it would be nice if we could
> backdoor assembled matrices straight on to the GPU.  That is, create a Mat
> saying "this is the sparsity pattern" and then, rather than calling
> MatSetValues on the host, just pass a pointer to the device data.

Thanks for the input. My reference implementation supports such kind of 
backdooring, so there is no conceptional problem with that. What I don't 
know yet is 'The Right Way' of integrating this functionality into the 
existing PETSc interface routines. Anyhow, I see this as an essential 
feature, so it's on my roadmap already.

> At the moment, we're doing a similar thing using CUSP, but are looking at
> doing multi-GPU assembly + solve and would like not to have to reinvent too
> many wheels, in particular, the MPI-parallel layer.  Additionally, we're
> already using PETSc for the CPU-side linear algebra so it would be nice to
> use the same interface everywhere.

Yes, that's what we are aiming for. The existing MPI-layer just works 
well irrespective of whether you're dealing with CPUs or GPUs on each rank.

> I guess effectively we'd like something like MatCreateSeqAIJWithArrays and
> MatCreateMPIAIJWithSplitArrays but with the ability to pass device pointers
> rather than host pointers.  Is there any roadmap in PETSc for this kind of
> thing?  Would patches in this direction be welcome?

Type safety is a bit nasty. CUDA allows to deal with plain 'void *', 
while OpenCL expects cl_mem. This suggests to use something like
   MatCreateSeqAIJWithCUDAArrays(),
   MatCreateSeqAIJWithOpenCLArrays(),
but as I said above, I haven't come to a decision on that yet.

I'm not aware of any roadmap on the GPU part, but I want to integrate 
this rather sooner than later. Patches are of course welcome, either for 
the current branch, or later on based on the refurbished GPU extensions.

Best regards,
Karli