[petsc-dev] Supporting OpenCL matrix assembly

Jed Brown jedbrown at mcs.anl.gov
Tue Sep 24 11:19:25 CDT 2013


Karl Rupp <rupp at mcs.anl.gov> writes:

> Hi,
>
>>> If the context and queue are not attached to objects, then they would
>>> essentially represent global state, which is something I want to avoid.
>>
>> I was thinking that the context returned would be specific to the Mat
>> and the device it was about to run on.
>
> Users who want to do the assembly right on the OpenCL device usually 
> want us to use *their* context, hence all the need for such an interface.

Hmm, I think we're use "context" to mean different things.  When I say
"matrix context", I mean whatever kernels use to identify the matrix
into which they want to set entries.

I think you were referring to the cl_context, which (now that you have
pointed out the issue) I think should be passed to the Mat*GetSource().

>>> What if a user for example wants to split the matrix accross multiple
>>> OpenCL contexts (e.g. an AMD GPU and a Xeon Phi)?
>>
>> Maybe the GetSource() should take an argument specifying which device it
>> was obtaining code for?  I'm not convinced that this sort of hybridism
>> is useful, however.
>
> This depends on the degree of optimization. If you really want to go for 
> utmost performance, you need to return a source string which is device 
> specific. For a first implementation it is sufficient to just have one 
> kernel for all devices. The main benefit of the assembly on the device 
> is avoiding PCI-Express, so a few percent in raw kernel performance can 
> be considered microtuning...

Yup.

>> Umm, can't I copy the struct to the device and give the user a pointer
>> that they can shuttle into their kernels?
>
> Not a struct containing device buffer handles. You can copy
>    struct A {
>      int a;
>      double b[5];
>    };
> and anything else which is of fixed size, but you are not allowed to 
> pack cl_mem into a struct and pass that struct on to the kernel. This 
> is, unfortunately, a considerable abstraction problem, because it does 
> not allow you to 'just pass an opaque object'. For example, the three 
> CSR arrays need to be passed as separate kernel arguments. Yes, this is 
> ugly.

Ugh, that's terrible.  Alternative:

We return an array of (size, arg_data) pairs.  The user adds these to
their kernel.  We'll provide a struct that they initialize somewhere at
the top of their kernel to pack all our programmatically-generated
arguments into something they can pass around reasonable.  Ugly, but not
a showstopper.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130924/41bfa0e8/attachment.sig>


More information about the petsc-dev mailing list