<p>Hi Karl, </p>

<p>I have attached the code already in my last email. It is the last attachment. The memory leak has been fixed. Thanks! </p>

<p>Cheers, <br>

Mani </p>

<div class="gmail_quote">On Feb 23, 2014 5:57 AM, "Karl Rupp" <<a href="mailto:rupp@mcs.anl.gov">rupp@mcs.anl.gov</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Mani,<br>

<br>

thanks for the quick feedback.<br>

<br>

> I tested the updated implementation of the viennacl bindings in<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

petsc-dev/next and I get rather poor performance when using viennacl on<br>

either cpu or gpu. I am using the TS module (type:theta) with a simple<br>

advection equation in 2D with resolution 256x256 and 8 variables.<br>

</blockquote>

<br>

Good, this has about 500k unknowns, so OpenCL kernel launch overhead should not be a show-stopper.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I<br>

tested with the following cases:<br>

<br>

1) Single cpu with petsc's old aij mat and vec implementation<br>

2) Viennacl mat and vec and using VecViennaCLGetArrayRead/Write in the<br>

residual evaluation function on an intel cpu with intel's opencl.<br>

3) Viennacl mat and vec and using VecViennaCLGetArrayRead/Write in the<br>

residual evaluation function on an nvidia gpu.<br>

<br>

The first case is the fastest and the other cases are 2-3 times slower.<br>

Attached are the log summaries for each cases and the code I used to<br>

test with. I am running using the following command:<br>

<br>

time ./petsc_opencl -ts_monitor -snes_monitor -ts_dt 0.01 -ts_max_steps<br>

10 -ts_type theta -log_summary<br>

</blockquote>

<br>

As Matt already noted, the bottleneck here is the frequent copy from/to the device. I see 90% of the time spent in MatFDColorApply, so this is where we need to look at. Is there any chance you can send me the code to reproduce this directly? Is it the same you sent me back in January?<br>


<br>

Btw: Mani, does the memory still get filled up on the GPU for larger time steps?<br>

<br>

Best regards,<br>

Karli<br>

<br>

</blockquote></div>