[petsc-dev] Supporting OpenCL matrix assembly

Jed Brown jedbrown at mcs.anl.gov
Mon Sep 23 17:06:46 CDT 2013


Karl Rupp <rupp at mcs.anl.gov> writes:

> a)
> I think this needs a second thought on how we manage the raw OpenCL 
> buffers. My suggestion last year was that we 'wrap' pointers to raw 
> memory buffers into something like
>   struct generic_ptr {
>     void * cpu_ptr;
>     void * cuda_ptr;
>     cl_mem opencl_ptr;
>   };
> underneath the 'special pointer' for Vec and Mat, but we then decided on 
> using a library-specific dispatch, i.e. spptr points to whatever a 
> library needs. For MatOpenCLGetSetValuesSource() we would have to be 
> very careful in the way the buffers are passed to the kernel, as 
> different OpenCL backends may expect slightly different semantics. 
> Currently we only have ViennaCL for that purpose, but even though it is 
> 'my own' library, there is no point in being restrictive here.

Perhaps that *GetSource method should also return an opaque device "Mat"
pointer that the user is responsible for shepherding into the kernel
From which they call the device MatSetValues?

> b)
> Other than that, I'm not sure whether I understand the semantics of the 
> proposed function correctly. In order for MatOpenCLGetSetValuesSource() 
> to be callable by device threads, 

The *GetSource method would be called from the CPU and would return a
string containing the implementation of a type-specialized MatSetValues
implementation.  The user would prepend its source to the string they
pass to the OpenCL compiler.  Their own part of that string would
contain code that calls MatSetValues (perhaps with a name that makes it
clear that it's running on the device).

> it needs to be all embedded into the OpenCL sources, which means that
> it has no knowledge about any of the PETSc types. If, on the other
> hand, this is supposed to be a PETSc function, then I don't know what
> 'synchronization_mechanism' is supposed to do. In addition, the OpenCL
> context and command queue should be passed as parameters to
> MatOpenCLGetSetValuesSource().

Suppose the column indices have been set in advance.  Now if the
application already has a way of preventing conflicted cross-threadblock
writes to those slots within an insertion round (e.g., coloring), PETSc
would not need any synchronization and wouldn't need to stash
possibly-conflicted writes elsewhere.  Otherwise, PETSc would have to
manage the stashing, use atomics, or some other scheme.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130923/51425d76/attachment.sig>


More information about the petsc-dev mailing list