<div dir="ltr">Hi Karl,<div><br></div><div>It actually works even if I use ComputeResidualViennaCL without using ViennaCL vecs! Isn't it because there is a consistency check to see where the data resides whenever there is a call to VecViennaCLGetArray?</div>

<div><br></div><div>I reran the code on another machine with a NVIDIA GeForce 550 ti with 1GB of global memory. It still fails with the following error:</div><div><br></div><div><div>ViennaCL: FATAL ERROR: Kernel start failed for 'vec_mul'.</div>

<div>ViennaCL: Smaller work sizes could not solve the problem. </div><div>[0]PETSC ERROR: --------------------- Error Message ------------------------------------</div><div>[0]PETSC ERROR: Error in external library!</div>

<div>[0]PETSC ERROR: ViennaCL error: ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE </div><div> ViennaCL could not allocate memory on the device. Most likely the device simply ran out of memory.</div></div><div><br>

</div><div>To reproduce the error, I only had to turn on the ViennaCL vecs. The error occurs irrespective of which residual evaluation function I use. I checked to see if there was any memory usage increase in the CPU memory while the run was going on but there was none.</div>

<div><br></div><div>Cheers,</div><div>Mani</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jan 28, 2014 at 10:12 AM, Karl Rupp <span dir="ltr"><<a href="mailto:rupp@mcs.anl.gov" target="_blank">rupp@mcs.anl.gov</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Mani,<div class="im"><br>

<br>

> Thanks for the explanation. Do you think it will help if I use a GPU<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

which is capable of doing double precision arithmetic?<br>

</blockquote>

<br></div>

Your GPU must be supporting double precision, otherwise the jit-compiler will fail. However, your GPU might emulate double precision arithmetics poorly.<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I am using NVIDIA Quadra FX 1800 M. It has 1GB of global memory.<br>

</blockquote>

<br></div>

For production runs you definitely want to use a discrete GPU to get the benefits of higher bandwidth (through higher heat dissipation...)<div class="im"><br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Unfortunately NVIDIAs' visual profiler does not seem to work with its<br>

OpenCL implementation. The code does not crash when I run it on the CPU<br>

using Intels OpenCL.<br>

</blockquote>

<br></div>

Let me put it this way: This ridiculous move of taking OpenCL debugging capabilities out when releasing CUDA 5.0 is not based on scientific reasoning...<div class="im"><br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I mean't to say that the code does not crash either with<br>

ComputeResidualViennaCL or ComputeResidual with the normal Petsc<br>

Vec/Mats but does indeed crash when either of them are used with the<br>

ViennaCL vecs.<br>

</blockquote>

<br></div>

I'm wondering how ComputeResidualViennaCL will work if the vector type is not AIJVIENNACL?<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Do you think there is memory allocation at every time<br>

step? I thought all the memory would be allocated during initialization.<br>

</blockquote>

<br></div>

>From your description a memory allocation seems to be the case, yes, but I still need to verify this. Did you see a similar increase in memory consumption (use e.g. top) with Intel's OpenCL SDK?<br>

<br>

Best regards,<br>

Karli<br>

<br>

<br>

</blockquote></div><br></div>