[petsc-dev] Supporting OpenCL matrix assembly
Jed Brown
jedbrown at mcs.anl.gov
Tue Sep 24 11:19:25 CDT 2013
Karl Rupp <rupp at mcs.anl.gov> writes:
> Hi,
>
>>> If the context and queue are not attached to objects, then they would
>>> essentially represent global state, which is something I want to avoid.
>>
>> I was thinking that the context returned would be specific to the Mat
>> and the device it was about to run on.
>
> Users who want to do the assembly right on the OpenCL device usually
> want us to use *their* context, hence all the need for such an interface.
Hmm, I think we're use "context" to mean different things. When I say
"matrix context", I mean whatever kernels use to identify the matrix
into which they want to set entries.
I think you were referring to the cl_context, which (now that you have
pointed out the issue) I think should be passed to the Mat*GetSource().
>>> What if a user for example wants to split the matrix accross multiple
>>> OpenCL contexts (e.g. an AMD GPU and a Xeon Phi)?
>>
>> Maybe the GetSource() should take an argument specifying which device it
>> was obtaining code for? I'm not convinced that this sort of hybridism
>> is useful, however.
>
> This depends on the degree of optimization. If you really want to go for
> utmost performance, you need to return a source string which is device
> specific. For a first implementation it is sufficient to just have one
> kernel for all devices. The main benefit of the assembly on the device
> is avoiding PCI-Express, so a few percent in raw kernel performance can
> be considered microtuning...
Yup.
>> Umm, can't I copy the struct to the device and give the user a pointer
>> that they can shuttle into their kernels?
>
> Not a struct containing device buffer handles. You can copy
> struct A {
> int a;
> double b[5];
> };
> and anything else which is of fixed size, but you are not allowed to
> pack cl_mem into a struct and pass that struct on to the kernel. This
> is, unfortunately, a considerable abstraction problem, because it does
> not allow you to 'just pass an opaque object'. For example, the three
> CSR arrays need to be passed as separate kernel arguments. Yes, this is
> ugly.
Ugh, that's terrible. Alternative:
We return an array of (size, arg_data) pairs. The user adds these to
their kernel. We'll provide a struct that they initialize somewhere at
the top of their kernel to pack all our programmatically-generated
arguments into something they can pass around reasonable. Ugly, but not
a showstopper.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130924/41bfa0e8/attachment.sig>
More information about the petsc-dev
mailing list