[petsc-dev] Supporting OpenCL matrix assembly

Karl Rupp rupp at mcs.anl.gov
Tue Sep 24 09:07:19 CDT 2013


Hey,

On 09/24/2013 03:53 PM, Jed Brown wrote:
> Karl Rupp <rupp at mcs.anl.gov> writes:
>> I'm not talking about CSR vs. COO from the SpMV point of view, but
>> rather on how to store the actual data in global memory without
>> expensive subsequent sorts.
>
> Sure, but this seems like such a minor detail.  With PetscScalar=double
> and PetscInt=int, we have 16 bytes/entry for COO and (nominally) 12
> bytes/entry for CSR, and it only needs to go to GPU global memory and
> back, not across to the CPU.  I doubt the difference between 12 and 16
> bytes/entry during assembly is a bottleneck.

I'm not worried about 12 bytes vs. 16 bytes, but rather about the 
ordering of entries as a whole. If one assembles into something 
CSR-like, then one can either run the SpMV right away, or merge entries 
in each row of the matrix which have the same column indices. Merging 
such entries can usually be done in shared memory, so the memory costs 
is one read and write of the matrix nonzero entries in worst case.

On the contrary, if everything is assembled into a general COO format, 
then one needs to sort the triplets by row first in order to be even 
able to run SpMVs. The memory transactions required for this are
O(N log(N)) with N being the number of nonzeros. N is in almost all 
cases larger than 10^6, so the log(N) hurts...

Best regards,
Karli




More information about the petsc-dev mailing list