[petsc-dev] Supporting OpenCL matrix assembly
Jed Brown
jedbrown at mcs.anl.gov
Mon Sep 23 16:46:03 CDT 2013
Matthew Knepley <knepley at gmail.com> writes:
> Okay, here is how I understand GPU matrix assembly. The only way it
> makes sense to me is in COO format which you may later convert. In
> mpiaijAssemble.cu I have code that
>
> - Produces COO rows
> - Segregates them into on and off-process rows
These users compute redundantly and set MAT_NO_OFF_PROC_ENTRIES.
> - Sorts and reduces by key
... then insert into diagonal and off-diagonal parts of owned matrices.
> This can obviously be done incrementally, so storing a batch of
> element matrices to global memory is not a problem.
If you store element matrices to global memory, you're using a ton of
bandwidth (about 20x the size of the matrix if using P1 tets).
What if you do the sort/reduce thing within thread blocks, and only
write the reduced version to global storage?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130923/43397e18/attachment.sig>
More information about the petsc-dev
mailing list