[petsc-dev] Supporting OpenCL matrix assembly

Mon Sep 23 17:43:45 CDT 2013

On Mon, Sep 23, 2013 at 2:46 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> Matthew Knepley <knepley at gmail.com> writes:
>
> > Okay, here is how I understand GPU matrix assembly. The only way it
> > makes sense to me is in COO format which you may later convert. In
> > mpiaijAssemble.cu I have code that
> >
> >   - Produces COO rows
> >   - Segregates them into on and off-process rows
>
> These users compute redundantly and set MAT_NO_OFF_PROC_ENTRIES.

Fine, we should have a flag like that.

> >   - Sorts and reduces by key
>
> ... then insert into diagonal and off-diagonal parts of owned matrices.

Yep.

> > This can obviously be done incrementally, so storing a batch of
> > element matrices to global memory is not a problem.
>
> If you store element matrices to global memory, you're using a ton of
> bandwidth (about 20x the size of the matrix if using P1 tets).
>
> What if you do the sort/reduce thing within thread blocks, and only
> write the reduced version to global storage?
>

I think it should be easy, but we will have to see what is out there for
thread blocks.

   Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130923/3a4380fd/attachment.html>