[petsc-dev] Supporting OpenCL matrix assembly

Karl Rupp rupp at mcs.anl.gov
Tue Sep 24 10:08:22 CDT 2013


Hi,

>> If the context and queue are not attached to objects, then they would
>> essentially represent global state, which is something I want to avoid.
>
> I was thinking that the context returned would be specific to the Mat
> and the device it was about to run on.

Users who want to do the assembly right on the OpenCL device usually 
want us to use *their* context, hence all the need for such an interface.


>> What if a user for example wants to split the matrix accross multiple
>> OpenCL contexts (e.g. an AMD GPU and a Xeon Phi)?
>
> Maybe the GetSource() should take an argument specifying which device it
> was obtaining code for?  I'm not convinced that this sort of hybridism
> is useful, however.

This depends on the degree of optimization. If you really want to go for 
utmost performance, you need to return a source string which is device 
specific. For a first implementation it is sufficient to just have one 
kernel for all devices. The main benefit of the assembly on the device 
is avoiding PCI-Express, so a few percent in raw kernel performance can 
be considered microtuning...


>> I think you were referring to the 'Mat' on the device, while I was
>> referring to the plain PETSc Mat. The difficulty for a 'Mat' on the
>> device is a limitation of OpenCL in defining opaque types: It is not
>> possible to have something like
>>    typedef struct OpenCLMat {
>>      __global int row_indices;
>>      __global int col_indices;
>>      __global float entries;
>>    } PetscMat;
>> and pass this as a single kernel argument.
>> (cf. OpenCL standard or
>> http://stackoverflow.com/questions/17635898/passing-struct-with-pointer-members-to-opencl-kernel-using-pyopencl)
>
> Umm, can't I copy the struct to the device and give the user a pointer
> that they can shuttle into their kernels?

Not a struct containing device buffer handles. You can copy
   struct A {
     int a;
     double b[5];
   };
and anything else which is of fixed size, but you are not allowed to 
pack cl_mem into a struct and pass that struct on to the kernel. This 
is, unfortunately, a considerable abstraction problem, because it does 
not allow you to 'just pass an opaque object'. For example, the three 
CSR arrays need to be passed as separate kernel arguments. Yes, this is 
ugly.

Best regards,
Karli




More information about the petsc-dev mailing list