[petsc-dev] Supporting OpenCL matrix assembly
Karl Rupp
rupp at mcs.anl.gov
Tue Sep 24 10:08:22 CDT 2013
Hi,
>> If the context and queue are not attached to objects, then they would
>> essentially represent global state, which is something I want to avoid.
>
> I was thinking that the context returned would be specific to the Mat
> and the device it was about to run on.
Users who want to do the assembly right on the OpenCL device usually
want us to use *their* context, hence all the need for such an interface.
>> What if a user for example wants to split the matrix accross multiple
>> OpenCL contexts (e.g. an AMD GPU and a Xeon Phi)?
>
> Maybe the GetSource() should take an argument specifying which device it
> was obtaining code for? I'm not convinced that this sort of hybridism
> is useful, however.
This depends on the degree of optimization. If you really want to go for
utmost performance, you need to return a source string which is device
specific. For a first implementation it is sufficient to just have one
kernel for all devices. The main benefit of the assembly on the device
is avoiding PCI-Express, so a few percent in raw kernel performance can
be considered microtuning...
>> I think you were referring to the 'Mat' on the device, while I was
>> referring to the plain PETSc Mat. The difficulty for a 'Mat' on the
>> device is a limitation of OpenCL in defining opaque types: It is not
>> possible to have something like
>> typedef struct OpenCLMat {
>> __global int row_indices;
>> __global int col_indices;
>> __global float entries;
>> } PetscMat;
>> and pass this as a single kernel argument.
>> (cf. OpenCL standard or
>> http://stackoverflow.com/questions/17635898/passing-struct-with-pointer-members-to-opencl-kernel-using-pyopencl)
>
> Umm, can't I copy the struct to the device and give the user a pointer
> that they can shuttle into their kernels?
Not a struct containing device buffer handles. You can copy
struct A {
int a;
double b[5];
};
and anything else which is of fixed size, but you are not allowed to
pack cl_mem into a struct and pass that struct on to the kernel. This
is, unfortunately, a considerable abstraction problem, because it does
not allow you to 'just pass an opaque object'. For example, the three
CSR arrays need to be passed as separate kernel arguments. Yes, this is
ugly.
Best regards,
Karli
More information about the petsc-dev
mailing list