[petsc-dev] Supporting OpenCL matrix assembly

Tue Sep 24 10:11:35 CDT 2013

Hi Matt,

> Here I believe strongly that we need tests. Nathan assured me that
> nothing is faster on the GPU than sort+reduce-by-key since
> they are highly optimized. I think they will be hard to beat, and the
> initial timings I had say that this is the case. I am willing to be
> wrong, but I am not willing to overengineer based on supposition.

Fair enough. Is a brute-force implementation for P1 elements sufficient 
as a baseline for discussion?

Best regards,
Karli