<div dir="ltr"><div>I like to pass in "-cl-nv-verbose" for compilation on nvidia cards. Also, I pass in parameters, for ex "-D Nx=128 -D Ny=128". I'll look at the ViennaCL api.<br><br></div>Thanks,<br>

Mani<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jan 21, 2014 at 3:28 AM, Karl Rupp <span dir="ltr"><<a href="mailto:rupp@mcs.anl.gov" target="_blank">rupp@mcs.anl.gov</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Mani,<div class="im"><br>

<br>

> I have a few questions regarding the usage of Viennacl in Petsc.<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

1) In the residual evaluation function:<br>

<br>

PetscErrorCode ComputeResidual(TS ts,<br>

                                PetscScalar t,<br>

                                Vec X, Vec dX_dt,<br>

                                Vec F, void *ptr)<br>

{<br>

     DM da;<br>

     Vec localX;<br>

     TSGetDM(ts, &da)<br>

     DMGetLocalVector(da, &localX);<br>

<br>

     DMGlobalToLocalBegin(da, X, INSERT_VALUES, localX);<br>

     DMGlobalToLocalEnd(da, X, INSERT_VALUES, localX);<br>

<br>

     viennacl::vector<PetscScalar> *x, *f;<br>

     VecViennaCLGetArrayWrite(<u></u>localX, &x);<br>

     VecViennaCLGetArrayRead(F, &f);<br>

<br>

     viennacl::ocl::enqueue(<u></u>myKernel(*x, *f));<br>

//Should it be viennacl::ocl::enqueue(<u></u>myKernel(x, f))?<br>

</blockquote>

<br></div>

It should be viennacl::ocl::enqueue(<u></u>myKernel(*x, *f));<br>

Usually you also want to pass the sizes to the kernel. Don't forget to cast the sizes to the correct types (e.g. cl_uint).<div class="im"><br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

     VecViennaCLRestoreArrayWrite(<u></u>localX, &x);<br>

     VecViennaCLRestoreArrayRead(F, &f);<br>

     DMRestoreLocalVector(da, &localX);<br>

}<br>

<br>

Will the residual evaluation occur on the GPU/accelerator depending on<br>

where we choose the ViennaCL array computations to occur? As I<br>

understand, if we simply use VecGetArray in the residual evaluation<br>

function, then the residual evaluation is still done on the CPU even<br>

though the solves are done on the GPU.<br>

</blockquote>

<br></div>

If you use VecViennaCLGetArrayWrite(), the data will be valid on the GPU, so your residual evaluation should happen in the OpenCL kernel you provide. This is already the case in the code snippet above.<div class="im"><br>


<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

2) How does one choose on which device the ViennaCL array computations<br>

will occur? I was looking for some flags like -viennacl<br>

cpu/gpu/accelerator but could not find any in -help.<br>

</blockquote>

<br></div>

Use one out of<br>

 -viennacl_device_cpu<br>

 -viennacl_device_gpu<br>

 -viennacl_device_accelerator<div class="im"><br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

3) How can one pass compiler flags when building OpenCL kernels in ViennaCL?<br>

</blockquote>

<br></div>

You could do that through the ViennaCL API directly, but I'm not sure whether you really want to do this. Which flags do you want to set? My experience is that these options have little to no effect on performance, particularly for the memory-bandwidth-limited case. This is also the reason why I haven't provided a PETSc routine for this.<br>


<br>

Best regards,<br>

Karli<br>

<br>

</blockquote></div><br></div>