[petsc-dev] Petsc+ViennaCL usage
Karl Rupp
rupp at mcs.anl.gov
Tue Jan 21 03:28:28 CST 2014
Hi Mani,
> I have a few questions regarding the usage of Viennacl in Petsc.
>
> 1) In the residual evaluation function:
>
> PetscErrorCode ComputeResidual(TS ts,
> PetscScalar t,
> Vec X, Vec dX_dt,
> Vec F, void *ptr)
> {
> DM da;
> Vec localX;
> TSGetDM(ts, &da)
> DMGetLocalVector(da, &localX);
>
> DMGlobalToLocalBegin(da, X, INSERT_VALUES, localX);
> DMGlobalToLocalEnd(da, X, INSERT_VALUES, localX);
>
> viennacl::vector<PetscScalar> *x, *f;
> VecViennaCLGetArrayWrite(localX, &x);
> VecViennaCLGetArrayRead(F, &f);
>
> viennacl::ocl::enqueue(myKernel(*x, *f));
> //Should it be viennacl::ocl::enqueue(myKernel(x, f))?
It should be viennacl::ocl::enqueue(myKernel(*x, *f));
Usually you also want to pass the sizes to the kernel. Don't forget to
cast the sizes to the correct types (e.g. cl_uint).
> VecViennaCLRestoreArrayWrite(localX, &x);
> VecViennaCLRestoreArrayRead(F, &f);
> DMRestoreLocalVector(da, &localX);
> }
>
> Will the residual evaluation occur on the GPU/accelerator depending on
> where we choose the ViennaCL array computations to occur? As I
> understand, if we simply use VecGetArray in the residual evaluation
> function, then the residual evaluation is still done on the CPU even
> though the solves are done on the GPU.
If you use VecViennaCLGetArrayWrite(), the data will be valid on the
GPU, so your residual evaluation should happen in the OpenCL kernel you
provide. This is already the case in the code snippet above.
> 2) How does one choose on which device the ViennaCL array computations
> will occur? I was looking for some flags like -viennacl
> cpu/gpu/accelerator but could not find any in -help.
Use one out of
-viennacl_device_cpu
-viennacl_device_gpu
-viennacl_device_accelerator
> 3) How can one pass compiler flags when building OpenCL kernels in ViennaCL?
You could do that through the ViennaCL API directly, but I'm not sure
whether you really want to do this. Which flags do you want to set? My
experience is that these options have little to no effect on
performance, particularly for the memory-bandwidth-limited case. This is
also the reason why I haven't provided a PETSc routine for this.
Best regards,
Karli
More information about the petsc-dev
mailing list