[petsc-dev] Supporting OpenCL matrix assembly

Karl Rupp rupp at mcs.anl.gov
Mon Sep 23 15:49:17 CDT 2013


Hi Jed,

 > We have some motivated users that would like a way to assemble matrices
> on a device, without needing to store all the element matrices to global
> memory or to transfer them to the CPU.  Given GPU execution models, this
> means we need something that can be done on-the-spot in kernels.  So
> what about a function that can be called by device threads?
>
> PetscErrorCode MatOpenCLGetSetValuesSource(Mat, synchronization_mechanism, char **);
>
> The user concatenates this type-specialized code into their source and
> calls MatSetValues().  The users I'm talking to here synchronize by
> coordinating threads using coloring of a sort.  The user still needs to
> call MatAssemblyBegin/End from outside a kernel, though that function
> may or may not need to invoke its own kernel.
>
> Crazy?

a)
I think this needs a second thought on how we manage the raw OpenCL 
buffers. My suggestion last year was that we 'wrap' pointers to raw 
memory buffers into something like
  struct generic_ptr {
    void * cpu_ptr;
    void * cuda_ptr;
    cl_mem opencl_ptr;
  };
underneath the 'special pointer' for Vec and Mat, but we then decided on 
using a library-specific dispatch, i.e. spptr points to whatever a 
library needs. For MatOpenCLGetSetValuesSource() we would have to be 
very careful in the way the buffers are passed to the kernel, as 
different OpenCL backends may expect slightly different semantics. 
Currently we only have ViennaCL for that purpose, but even though it is 
'my own' library, there is no point in being restrictive here.

b)
Other than that, I'm not sure whether I understand the semantics of the 
proposed function correctly. In order for MatOpenCLGetSetValuesSource() 
to be callable by device threads, it needs to be all embedded into the 
OpenCL sources, which means that it has no knowledge about any of the 
PETSc types. If, on the other hand, this is supposed to be a PETSc 
function, then I don't know what 'synchronization_mechanism' is supposed 
to do. In addition, the OpenCL context and command queue should be 
passed as parameters to MatOpenCLGetSetValuesSource().

Best regards,
Karli





More information about the petsc-dev mailing list