[petsc-dev] OpenCL contributor

Sat Nov 8 05:23:23 CST 2014

Hi Ken,

 > I am a PhD candidate in Heterogeneous HPC. I wonder if there's anything
> in particular that needs an OpenCL implementation in PETSc.

well, this is a fairly general question. Most OpenCL routines used in 
PETSc are currently provided via ViennaCL. The only exception right now 
is a finite element assembly kernel in PetscFE, which is implemented 
directly in PETSc.

What's your background and previous experience? Is there anything you 
are particularly interested in? Since you're a PhD candidate, I assume 
you are working already on a certain topic. My recommendation is to 
align any work on PETSc with your direct research agenda :-)

If you're looking for interesting research questions, I can think of a 
few: For example, in ongoing projects some of us are working on 
matrix-free applications of fine-grid operators in a multigrid context. 
Jed has written some kernels based on AVX intrinsics, for which he'd be 
very interested to see how they perform on GPUs. (Keep in mind that this 
is certainly not a good place to start if you are rather new to OpenCL...)

Then there is still a lot of work left in the unification of how PETSc 
deals with CUDA and OpenCL, so that they share a substantially larger 
code base rather than being maintained separately. This, however, is not 
about writing OpenCL kernels, but about deriving clever abstractions of 
these two programming models and a seamless integration with the PETSc core.

If you really want to write and optimize OpenCL kernels, then the best 
place to do so are preconditioners. A couple of GPU-accelerated 
preconditioners have been proposed in the literature, but most of them 
are based on CUDA. Also, most of them only ever showed up in the 
respective paper and never made it into reusable library code.

As things are right now, don't bother with OpenCL on CPUs. You have much 
better control over what is going on with a more conventional 
programming model based on MPI+intrinsics plus much lower latency for 
function/kernel calls.

Best regards,
Karli