[petsc-dev] Supporting OpenCL matrix assembly
Matthew Knepley
knepley at gmail.com
Mon Sep 23 14:56:42 CDT 2013
On Mon, Sep 23, 2013 at 12:30 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> We have some motivated users that would like a way to assemble matrices
> on a device, without needing to store all the element matrices to global
> memory or to transfer them to the CPU. Given GPU execution models, this
> means we need something that can be done on-the-spot in kernels. So
> what about a function that can be called by device threads?
>
> PetscErrorCode MatOpenCLGetSetValuesSource(Mat, synchronization_mechanism,
> char **);
>
> The user concatenates this type-specialized code into their source and
> calls MatSetValues(). The users I'm talking to here synchronize by
> coordinating threads using coloring of a sort. The user still needs to
> call MatAssemblyBegin/End from outside a kernel, though that function
> may or may not need to invoke its own kernel.
>
> Crazy?
>
Okay, here is how I understand GPU matrix assembly. The only way it makes
sense to me is
in COO format which you may later convert. In mpiaijAssemble.cu I have code
that
- Produces COO rows
- Segregates them into on and off-process rows
- Sorts and reduces by key
This can obviously be done incrementally, so storing a batch of element
matrices to global
memory is not a problem. I think the extra bandwidth is balanced by the
efficiency of the method.
I do not know of something better that can beat it with smaller bandwidth.
Matt
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130923/1dcfcecb/attachment.html>
More information about the petsc-dev
mailing list