[petsc-dev] Backend independent VecGetArray for GPUs

Fri Oct 17 14:01:20 CDT 2014

> On Oct 17, 2014, at 12:48 PM, Karl Rupp <rupp at iue.tuwien.ac.at> wrote:
> 
> 
>>   So in the compiled C/C++ world
>> 
>>>> VecCUDAGetArray()
>>>>  VecOpenCLGetArray()
>> 
>>    would essentially call VecCUSPGetArray() or VecViennaCLGetArray() and then pull the “raw” pointer out of that beasty?   But  we really don’t need VecCUDAGetArray() and VecOpenCLGetArray() in the C/C++ world since the user can simply call VecCUSPGetArray() or VecViennaCLGetArray() and then use the CUSP or ViennaCL specific way of pulling out the “raw” CUSP or OpenCL pointer, is this correct?
> 
> With the current implementation, this is correct.
> However, I would like to decouple the use of CUDA and OpenCL buffers from the respective packages. The current implementation in PETSc is such that the buffers are burried in package-specific types (cusp::device_vector, viennacl::vector), from which they can be retrieved. For better modularity, I would like to invert this relationship and store the native memory handles. CUSP and ViennaCL then only kick in when certain operations are requested,

  But you won’t want to have to create the CUSP or ViennaCL objects on the fly each time from the CUDA/OpenCL “raw pointers”? This means you would
still need to store the CUSP/ViennaCL object along with the “raw pointer”, so how is that different from now?

> for example a simple dot product, or a full-fledged preconditioner. This allows for a much better mix&match: One could for example use the CUDA backend in ViennaCL for certain operations not provided by CUSP (e.g. VecMDot()), and combine this with the AMG preconditioner in CUSP. One can later add any other library that is able to work with such raw memory buffers (a property absolutely essential for any library interaction, but too often totally forgotten in the C++ world...)
> 
> 
>>  Now moving onto Python, presumably CUSP and ViennaCL do not have python bindings? (Why the heck not?) And thus there is no “natural” thing for VecCUSPGetArray() or VecViennaCLGetArray() to return in Python?
> 
> Depends... The natural thing for the ViennaCL case is pyViennaCL (http://viennacl.sourceforge.net/pyviennacl/doc/), but I don't know enough details of Python in order to judge whether this is appropriate in this setting. For CUSP there's no "natural" thing (as far as I know) other than a raw PyCUDA handle. That's okay for vectors, but it gets pretty delicate with sparse matrices using GPU-specific memory layouts.
> 
> 
>>   So, the conclusion is that since CUSP and ViennaCL do not have Python bindings (why the heck not?) this means PETSc needs to extend its API even in C/C++ land? Or would VecCUDAGetArray() and VecOpenCLGetArray() only exist in Python? An alternative is to introduce into Python a CUSP array class and a Vienna CL array class that only does one thing: provides a way to pull out the raw CUDA or OpenCL pointer, then in Python the user would do the same thing as C/C++, first pull out the CUSP/ViennaCL pointer with VecCUSPGetArray() or VecViennaCLGetArray() and then pull out the raw pointer using the little “wrapper class"?
> 
> I think the problem here is that the implementation is too package-centric rather than framework-centric. (CUDA/OpenCL)
> 
> 
>> One the naming: It should be VecCUSPGetCUDAArray() and VecViennaCLGetOpenCLArray() since the second part of the name is associated with the sub class of the first part of the name (e.g. VecCUSP is a subclass of Vec) Since there is no VecOpenCL subclass, for example, it cannot be VecOpenCLGetArray()
>> 
>>   What am I missing?
> 
> The current implementation would indeed require
> Vec[PACKAGE]Get[FRAMEWORK]Array()
> I guess you can see the ugliness and poor scalability... ;-)
> 
> Best regards,
> Karli
>