[petsc-dev] Supporting OpenCL matrix assembly

Mon Sep 23 14:56:42 CDT 2013

On Mon, Sep 23, 2013 at 12:30 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> We have some motivated users that would like a way to assemble matrices
> on a device, without needing to store all the element matrices to global
> memory or to transfer them to the CPU.  Given GPU execution models, this
> means we need something that can be done on-the-spot in kernels.  So
> what about a function that can be called by device threads?
>
> PetscErrorCode MatOpenCLGetSetValuesSource(Mat, synchronization_mechanism,
> char **);
>
> The user concatenates this type-specialized code into their source and
> calls MatSetValues().  The users I'm talking to here synchronize by
> coordinating threads using coloring of a sort.  The user still needs to
> call MatAssemblyBegin/End from outside a kernel, though that function
> may or may not need to invoke its own kernel.
>
> Crazy?
>

Okay, here is how I understand GPU matrix assembly. The only way it makes
sense to me is
in COO format which you may later convert. In mpiaijAssemble.cu I have code
that

  - Produces COO rows
  - Segregates them into on and off-process rows
  - Sorts and reduces by key

This can obviously be done incrementally, so storing a batch of element
matrices to global
memory is not a problem. I think the extra bandwidth is balanced by the
efficiency of the method.
I do not know of something better that can beat it with smaller bandwidth.

    Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130923/1dcfcecb/attachment.html>