> I've tried to add pining the matrix and prolongator to the CPU on coarse grids in GAMG with this:
>     /* pin reduced coase grid - could do something smarter */
>     ierr = MatPinToCPU(*a_Amat_crs,PETSC_TRUE);CHKERRQ(ierr);
>     ierr = MatPinToCPU(*a_P_inout,PETSC_TRUE);CHKERRQ(ierr);

  What are the symptoms of it not working? Does it appear to be still copying the matrices to the GPU? then running the functions on the GPU?

  I suspect the pinning is incompletely done for CUDA (and MPIOpenCL) matrices. 

We need the equivalent of 

static PetscErrorCode MatPinToCPU_SeqAIJViennaCL(Mat A,PetscBool flg)
  A->pinnedtocpu = flg;
  if (flg) {
    A->ops->mult           = MatMult_SeqAIJ;
    A->ops->multadd        = MatMultAdd_SeqAIJ;
    A->ops->assemblyend    = MatAssemblyEnd_SeqAIJ;
    A->ops->duplicate      = MatDuplicate_SeqAIJ;
  } else {
    A->ops->mult           = MatMult_SeqAIJViennaCL;
    A->ops->multadd        = MatMultAdd_SeqAIJViennaCL;
    A->ops->assemblyend    = MatAssemblyEnd_SeqAIJViennaCL;
    A->ops->destroy        = MatDestroy_SeqAIJViennaCL;
    A->ops->duplicate      = MatDuplicate_SeqAIJViennaCL;

for MPIViennaCL and MPISeqAIJ Cusparse but it doesn't look like it has been written yet. 

> It does not seem to work. It does not look like CUDA has an MatCreateVecs. Should I add one and copy this flag over?

   We do need this function. But I don't see how it relates to pinning. When the matrix is pinned to the CPU we want it to create CPU vectors which I assume it does.

