[petsc-dev] Backend independent VecGetArray for GPUs

Fri Oct 17 12:48:45 CDT 2014

>    So in the compiled C/C++ world
>
>>> VecCUDAGetArray()
>>>   VecOpenCLGetArray()
>
>     would essentially call VecCUSPGetArray() or VecViennaCLGetArray() and then pull the “raw” pointer out of that beasty?   But  we really don’t need VecCUDAGetArray() and VecOpenCLGetArray() in the C/C++ world since the user can simply call VecCUSPGetArray() or VecViennaCLGetArray() and then use the CUSP or ViennaCL specific way of pulling out the “raw” CUSP or OpenCL pointer, is this correct?

With the current implementation, this is correct.
However, I would like to decouple the use of CUDA and OpenCL buffers 
from the respective packages. The current implementation in PETSc is 
such that the buffers are burried in package-specific types 
(cusp::device_vector, viennacl::vector), from which they can be 
retrieved. For better modularity, I would like to invert this 
relationship and store the native memory handles. CUSP and ViennaCL then 
only kick in when certain operations are requested, for example a simple 
dot product, or a full-fledged preconditioner. This allows for a much 
better mix&match: One could for example use the CUDA backend in ViennaCL 
for certain operations not provided by CUSP (e.g. VecMDot()), and 
combine this with the AMG preconditioner in CUSP. One can later add any 
other library that is able to work with such raw memory buffers (a 
property absolutely essential for any library interaction, but too often 
totally forgotten in the C++ world...)

>   Now moving onto Python, presumably CUSP and ViennaCL do not have python bindings? (Why the heck not?) And thus there is no “natural” thing for VecCUSPGetArray() or VecViennaCLGetArray() to return in Python?

Depends... The natural thing for the ViennaCL case is pyViennaCL 
(http://viennacl.sourceforge.net/pyviennacl/doc/), but I don't know 
enough details of Python in order to judge whether this is appropriate 
in this setting. For CUSP there's no "natural" thing (as far as I know) 
other than a raw PyCUDA handle. That's okay for vectors, but it gets 
pretty delicate with sparse matrices using GPU-specific memory layouts.

>    So, the conclusion is that since CUSP and ViennaCL do not have Python bindings (why the heck not?) this means PETSc needs to extend its API even in C/C++ land? Or would VecCUDAGetArray() and VecOpenCLGetArray() only exist in Python? An alternative is to introduce into Python a CUSP array class and a Vienna CL array class that only does one thing: provides a way to pull out the raw CUDA or OpenCL pointer, then in Python the user would do the same thing as C/C++, first pull out the CUSP/ViennaCL pointer with VecCUSPGetArray() or VecViennaCLGetArray() and then pull out the raw pointer using the little “wrapper class"?

I think the problem here is that the implementation is too 
package-centric rather than framework-centric. (CUDA/OpenCL)

> One the naming: It should be VecCUSPGetCUDAArray() and VecViennaCLGetOpenCLArray() since the second part of the name is associated with the sub class of the first part of the name (e.g. VecCUSP is a subclass of Vec) Since there is no VecOpenCL subclass, for example, it cannot be VecOpenCLGetArray()
>
>    What am I missing?

The current implementation would indeed require
  Vec[PACKAGE]Get[FRAMEWORK]Array()
I guess you can see the ugliness and poor scalability... ;-)

Best regards,
Karli