<div dir="ltr">Note that the error occurs in the middle of the run, after a certain number of time steps. The time step at which it fails is different for the two machines for which I tested. And it only happens when I use the GPU. The OpenCL backend using a CPU with Intel opencl works fine.<div>
<br></div><div>Cheers,<br>Mani</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jan 28, 2014 at 11:49 AM, Mani Chandra <span dir="ltr"><<a href="mailto:mc0710@gmail.com" target="_blank">mc0710@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Karl,<div><br></div><div>It actually works even if I use ComputeResidualViennaCL without using ViennaCL vecs! Isn't it because there is a consistency check to see where the data resides whenever there is a call to VecViennaCLGetArray?</div>
<div><br></div><div>I reran the code on another machine with a NVIDIA GeForce 550 ti with 1GB of global memory. It still fails with the following error:</div><div><br></div><div><div class="im"><div>ViennaCL: FATAL ERROR: Kernel start failed for 'vec_mul'.</div>
<div>ViennaCL: Smaller work sizes could not solve the problem. </div><div>[0]PETSC ERROR: --------------------- Error Message ------------------------------------</div><div>[0]PETSC ERROR: Error in external library!</div>
<div>[0]PETSC ERROR: ViennaCL error: ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE </div></div><div> ViennaCL could not allocate memory on the device. Most likely the device simply ran out of memory.</div></div>
<div><br>
</div><div>To reproduce the error, I only had to turn on the ViennaCL vecs. The error occurs irrespective of which residual evaluation function I use. I checked to see if there was any memory usage increase in the CPU memory while the run was going on but there was none.</div>
<div><br></div><div>Cheers,</div><div>Mani</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jan 28, 2014 at 10:12 AM, Karl Rupp <span dir="ltr"><<a href="mailto:rupp@mcs.anl.gov" target="_blank">rupp@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Mani,<div><br>
<br>
> Thanks for the explanation. Do you think it will help if I use a GPU<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
which is capable of doing double precision arithmetic?<br>
</blockquote>
<br></div>
Your GPU must be supporting double precision, otherwise the jit-compiler will fail. However, your GPU might emulate double precision arithmetics poorly.<div><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I am using NVIDIA Quadra FX 1800 M. It has 1GB of global memory.<br>
</blockquote>
<br></div>
For production runs you definitely want to use a discrete GPU to get the benefits of higher bandwidth (through higher heat dissipation...)<div><br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Unfortunately NVIDIAs' visual profiler does not seem to work with its<br>
OpenCL implementation. The code does not crash when I run it on the CPU<br>
using Intels OpenCL.<br>
</blockquote>
<br></div>
Let me put it this way: This ridiculous move of taking OpenCL debugging capabilities out when releasing CUDA 5.0 is not based on scientific reasoning...<div><br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I mean't to say that the code does not crash either with<br>
ComputeResidualViennaCL or ComputeResidual with the normal Petsc<br>
Vec/Mats but does indeed crash when either of them are used with the<br>
ViennaCL vecs.<br>
</blockquote>
<br></div>
I'm wondering how ComputeResidualViennaCL will work if the vector type is not AIJVIENNACL?<div><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Do you think there is memory allocation at every time<br>
step? I thought all the memory would be allocated during initialization.<br>
</blockquote>
<br></div>
>From your description a memory allocation seems to be the case, yes, but I still need to verify this. Did you see a similar increase in memory consumption (use e.g. top) with Intel's OpenCL SDK?<br>
<br>
Best regards,<br>
Karli<br>
<br>
<br>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>