[petsc-dev] Possible bugs when using TS with ViennaCL (continued)

Karl Rupp rupp at mcs.anl.gov
Tue Jan 28 03:09:42 CST 2014


Hi Mani,

> I've been testing further, the code using TS with ViennaCL and there are
> a couple of things I wanted to point out
>
> 1) When using the ComputeResidualViennaCL with either the normal Petsc
> Vecs/Mats or Vec/MatViennaCL, and using the GPU, the nonlinear
> convergence is very different from using an OpenCL CPU backend or just
> the regular Petsc code.
>
> a) Using NVIDIA OpenCL to run on the GPU to compute the residual and
> using either normal Petsc Vec/Mat or ViennaCL Vec/Mat:
>
> 0 TS dt 10 time 0
>      0 SNES Function norm 4.789374470711e-01
>      1 SNES Function norm 5.491749197245e-02
>      2 SNES Function norm 6.542412564158e-03
>      3 SNES Function norm 7.800844032317e-04
>      4 SNES Function norm 9.349243191537e-05
>      5 SNES Function norm 1.120692741097e-05
> 1 TS dt 10 time 10
>
> b) Using Intel OpenCL to run on the CPU to compute the residual and
> using either normal Petsc Vec/Mat or ViennaCL Vec/Mat::
>
> 0 TS dt 10 time 0
>      0 SNES Function norm 3.916582465172e-02
>      1 SNES Function norm 4.990998832000e-07
>
> c) Using ComputeResidual (which runs on the CPU) with the normal Petsc
> Vec/Mat
>
> 0 TS dt 10 time 0
>      0 SNES Function norm 3.916582465172e-02
>      1 SNES Function norm 4.990998832000e-07
> 1 TS dt 10 time 10
>
> You see that b) and c) match perfectly but a) is quite different. Why
> could this be?

The reason are different arithmetic units. Your OpenCL kernel contains
dx_dt[INDEX_GLOBAL(i,j,var)] -
    (x[INDEX_GLOBAL(i+1,j,var)] -
     x[INDEX_GLOBAL(i,j,var)])/DX1 -
    (x[INDEX_GLOBAL(i,j+1,var)] -
    x[INDEX_GLOBAL(i,j,var)])/DX2
so you are subtracting values of about the same magnitude multiple 
times. You get consistent results on the CPU because the same arithmetic 
units get used irrespective of OpenCL-based or 'native' execution. The 
NVIDIA GPU has different round-off behavior. You are likely to see 
similar effects with AMD GPUs. There is nothing we can do to change this.


> 2) When I try using either ComputeResidual or ComputeResidualViennaCL
> with the ViennaCL Vec/Mats, the GPU run crashes at a late time because
> of a routine in ViennaCL.
>
> ViennaCL: FATAL ERROR: Kernel start failed for 'vec_mul'.
> ViennaCL: Smaller work sizes could not solve the problem.
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Error in external library!
> [0]PETSC ERROR: ViennaCL error: ViennaCL: FATAL ERROR:
> CL_MEM_OBJECT_ALLOCATION_FAILURE
>
> I have attached the full crash log. The crash occurs late into the run,
> in this case at the 80th time step. I thought all memory allocation
> occurs at the beginning of the run, so I don't quite understand why its
> failing.

Okay, this sounds like the GPU ultimately runs out of memory. Which GPU 
do you use? How much memory does it have? Do you also see an increase in 
memory consumption with the Intel OpenCL SDK?


 > Note that the code works if I use ComputeResidualViennaCL with
 > the normal Petsc Vec/Mats.

You mean ComputeResidual(), don't you?

Best regards,
Karli




More information about the petsc-dev mailing list