[petsc-dev] Supporting OpenCL matrix assembly
Karl Rupp
rupp at mcs.anl.gov
Tue Sep 24 03:48:17 CDT 2013
Hey,
> Perhaps that *GetSource method should also return an opaque device "Mat"
> pointer that the user is responsible for shepherding into the kernel
> From which they call the device MatSetValues?
This is easy of the OpenCL management is within PETSc (i.e. context,
buffers and command queues managed by us). I expect that a bunch of
users wants to provide their own context and stuff, which would require
us to offer something like
MatAttachOpenCLEnvironment(Mat,cl_context,cl_command_queue);
for all the matrix and vector objects involved. Note that this needs to
be attached before the matrix is created. I think this is doable.
>> b)
>> Other than that, I'm not sure whether I understand the semantics of the
>> proposed function correctly. In order for MatOpenCLGetSetValuesSource()
>> to be callable by device threads,
>
> The *GetSource method would be called from the CPU and would return a
> string containing the implementation of a type-specialized MatSetValues
> implementation. The user would prepend its source to the string they
> pass to the OpenCL compiler. Their own part of that string would
> contain code that calls MatSetValues (perhaps with a name that makes it
> clear that it's running on the device).
Ok, that makes more sense. :-)
> Suppose the column indices have been set in advance. Now if the
> application already has a way of preventing conflicted cross-threadblock
> writes to those slots within an insertion round (e.g., coloring), PETSc
> would not need any synchronization and wouldn't need to stash
> possibly-conflicted writes elsewhere. Otherwise, PETSc would have to
> manage the stashing, use atomics, or some other scheme.
I see. My experience is that synchronizations, particularly atomics, are
usually too expensive on GPUs if one wants to compete with an optimized
CPU implimentation. Coloring is often reasonable, but the price to pay
are bad strong scaling properties, because each color induces a ~10us
kernel launch overhead. Either way, that's a reasonable implementation
approach to start with.
Best regards,
Karli
More information about the petsc-dev
mailing list