[petsc-dev] Supporting OpenCL matrix assembly

Karl Rupp rupp at mcs.anl.gov
Tue Sep 24 08:38:45 CDT 2013


Hey,

 >> My primary metric for GPU kernels is memory transfers from global memory
>> ('flops are free'), hence what I suggest for the assembly stage is to go
>> with something CSR-like rather than COO. Pure CSR may be too expensive
>> in terms of element lookup if there are several fields involved
>> (particularly 3d), so one could push (column-index, value) pairs for
>> each row and making the merge-by-key much cheaper than for arbitrary COO
>> matrices.
>
> I think CSR vs. COO is a second-order optimization to be considered
> after the 20x redundancy has been eliminated and a synchronization
> strategy has been chosen (e.g., coloring vs redundant storage and later
> compression).

I'm not talking about CSR vs. COO from the SpMV point of view, but 
rather on how to store the actual data in global memory without 
expensive subsequent sorts.


>> This, of course, requires the knowledge of the nonzero pattern and
>> couplings among elements, yet this is reasonably cheap to extract for a
>> large number of problems (for example, (non)linear PDEs without
>> adaptivity). Also, the nonzero pattern is rather cheap to obtain if one
>> uses coloring for avoiding expensive atomic writes to global memory.
>
> At this point, I don't mind having the nonzero pattern set ahead of time
> using CPU code.  It's reassembly in time-dependent problems with no
> adaptivity or occasional adaptivity that I'm more concerned with.

Okay, this makes things a lot easier :-)

Best regards,
Karli



More information about the petsc-dev mailing list