[petsc-dev] Supporting OpenCL matrix assembly
Karl Rupp
rupp at mcs.anl.gov
Tue Sep 24 10:11:35 CDT 2013
Hi Matt,
> Here I believe strongly that we need tests. Nathan assured me that
> nothing is faster on the GPU than sort+reduce-by-key since
> they are highly optimized. I think they will be hard to beat, and the
> initial timings I had say that this is the case. I am willing to be
> wrong, but I am not willing to overengineer based on supposition.
Fair enough. Is a brute-force implementation for P1 elements sufficient
as a baseline for discussion?
Best regards,
Karli
More information about the petsc-dev
mailing list