[petsc-dev] Supporting OpenCL matrix assembly
Karl Rupp
rupp at mcs.anl.gov
Tue Sep 24 09:07:19 CDT 2013
Hey,
On 09/24/2013 03:53 PM, Jed Brown wrote:
> Karl Rupp <rupp at mcs.anl.gov> writes:
>> I'm not talking about CSR vs. COO from the SpMV point of view, but
>> rather on how to store the actual data in global memory without
>> expensive subsequent sorts.
>
> Sure, but this seems like such a minor detail. With PetscScalar=double
> and PetscInt=int, we have 16 bytes/entry for COO and (nominally) 12
> bytes/entry for CSR, and it only needs to go to GPU global memory and
> back, not across to the CPU. I doubt the difference between 12 and 16
> bytes/entry during assembly is a bottleneck.
I'm not worried about 12 bytes vs. 16 bytes, but rather about the
ordering of entries as a whole. If one assembles into something
CSR-like, then one can either run the SpMV right away, or merge entries
in each row of the matrix which have the same column indices. Merging
such entries can usually be done in shared memory, so the memory costs
is one read and write of the matrix nonzero entries in worst case.
On the contrary, if everything is assembled into a general COO format,
then one needs to sort the triplets by row first in order to be even
able to run SpMVs. The memory transactions required for this are
O(N log(N)) with N being the number of nonzeros. N is in almost all
cases larger than 10^6, so the log(N) hurts...
Best regards,
Karli
More information about the petsc-dev
mailing list