[petsc-dev] Petsc+ViennaCL usage

Fri Jan 24 14:34:42 CST 2014

Hi Mani,

 > 1) In ViennaCL, queue.finish() seems to be called only if we enable the
> flags VIENNACL_DEBUG_ALL or VIENNACL_DEBUG_KERNEL. How do I ensure that
> my custom kernel finishes when the debug mode is not enabled?

Use this:

> viennacl::ocl::enqueue(kernel(*vars, *dvars_dt, *Fvars));
> viennacl::backend::finish();

However, you don't need to call finish() at all. All reads from the 
device are implicitly synchronized within the OpenCL command queue, so 
any subsequent operations are guaranteed to work on the latest data.

> 2) My platform 0 has only a GPU. So when I launch my custom kernel
> inside the residual function it indeed does evaluate on the GPU. In
> particular, suppose my residual function (for seq case) is like this:
>
> PetscErrorCode ComputeResidual(TS ts,
>                                 PetscScalar t,
>                                 Vec X, Vec dX_dt,
>                                 Vec F, void *ptr)
> {
>      VecViennaCLGetArrayRead(X, &x);
>      VecViennaCLGetArrayWrite(F, &f);
>
>      viennacl::ocl::enqueue(myKernel(*x, *f));
> // Put something here to finish the kernel.
>
>      VecViennaCLRestoreArrayRead(X, &x);
>      VecViennaCLRestoreArrayWrite(F, &f);
> }
>
> and I execute as given below:
>
> ./program
>
> then the code inside the ComputeResidual function runs inside the GPU
> but everything else runs on the CPU, right? (since I did not specify
> -dm_vec_type viennacl  -dm_mat_type aijviennacl).

If I remember correctly, you'll get an error if you check the return 
value from VecViennaCL*();

> Now suppose I execute
> as given below:
>
> ./program -dm_vec_type viennacl -dm_mat_type aijviennacl
>
> then every vector operation occurs using the viennacl code
> in vecviennacl.cxx. And since my default platform is 0 (only having a
> NVIDIA GPU), I thought everything will run on the GPU. However with the
> ViennaCL debug mode, I get the following messages for the vector operations:
>
> ViennaCL: Starting 1D-kernel 'assign_cpu'...
> ViennaCL: Global work size: '16384'...
> ViennaCL: Local work size: '128'...
> ViennaCL: Kernel assign_cpu finished!
>
> How is it possible that part of the ViennaCL code is using my CPU (which
> is on a completely different platform, #1) and the custom kernel is
> launched on my GPU (platform #0).

The kernel 'assign_cpu' indicates that the operation
  x[i] <- alpha
is performed on the OpenCL device, where alpha is a scalar value located 
in main RAM ('provided from CPU RAM', hence the 'cpu' suffix). All 
ViennaCL-related operations are executed as expected on the GPU.

Note to self: We better include the active device name in the debug 
output. :-)

Best regards,
Karli