[petsc-dev] Possible bugs when using TS with ViennaCL (continued)

Tue Jan 28 11:51:33 CST 2014

Note that the error occurs in the middle of the run, after a certain number
of time steps. The time step at which it fails is different for the two
machines for which I tested. And it only happens when I use the GPU. The
OpenCL backend using a CPU with Intel opencl works fine.

Cheers,
Mani

On Tue, Jan 28, 2014 at 11:49 AM, Mani Chandra <mc0710 at gmail.com> wrote:

> Hi Karl,
>
> It actually works even if I use ComputeResidualViennaCL without using
> ViennaCL vecs! Isn't it because there is a consistency check to see where
> the data resides whenever there is a call to VecViennaCLGetArray?
>
> I reran the code on another machine with a NVIDIA GeForce 550 ti with 1GB
> of global memory. It still fails with the following error:
>
> ViennaCL: FATAL ERROR: Kernel start failed for 'vec_mul'.
> ViennaCL: Smaller work sizes could not solve the problem.
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Error in external library!
> [0]PETSC ERROR: ViennaCL error: ViennaCL: FATAL ERROR:
> CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
>
> To reproduce the error, I only had to turn on the ViennaCL vecs. The error
> occurs irrespective of which residual evaluation function I use. I checked
> to see if there was any memory usage increase in the CPU memory while the
> run was going on but there was none.
>
> Cheers,
> Mani
>
>
> On Tue, Jan 28, 2014 at 10:12 AM, Karl Rupp <rupp at mcs.anl.gov> wrote:
>
>> Hi Mani,
>>
>>
>> > Thanks for the explanation. Do you think it will help if I use a GPU
>>
>>> which is capable of doing double precision arithmetic?
>>>
>>
>> Your GPU must be supporting double precision, otherwise the jit-compiler
>> will fail. However, your GPU might emulate double precision arithmetics
>> poorly.
>>
>>
>>  I am using NVIDIA Quadra FX 1800 M. It has 1GB of global memory.
>>>
>>
>> For production runs you definitely want to use a discrete GPU to get the
>> benefits of higher bandwidth (through higher heat dissipation...)
>>
>>
>>
>>  Unfortunately NVIDIAs' visual profiler does not seem to work with its
>>> OpenCL implementation. The code does not crash when I run it on the CPU
>>> using Intels OpenCL.
>>>
>>
>> Let me put it this way: This ridiculous move of taking OpenCL debugging
>> capabilities out when releasing CUDA 5.0 is not based on scientific
>> reasoning...
>>
>>
>>
>>  I mean't to say that the code does not crash either with
>>> ComputeResidualViennaCL or ComputeResidual with the normal Petsc
>>> Vec/Mats but does indeed crash when either of them are used with the
>>> ViennaCL vecs.
>>>
>>
>> I'm wondering how ComputeResidualViennaCL will work if the vector type is
>> not AIJVIENNACL?
>>
>>
>>  Do you think there is memory allocation at every time
>>> step? I thought all the memory would be allocated during initialization.
>>>
>>
>> From your description a memory allocation seems to be the case, yes, but
>> I still need to verify this. Did you see a similar increase in memory
>> consumption (use e.g. top) with Intel's OpenCL SDK?
>>
>> Best regards,
>> Karli
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140128/05107190/attachment.html>