[petsc-dev] Performance of Petsc + ViennaCL 1.5.1 (branch:petsc-dev/next)

Karl Rupp rupp at mcs.anl.gov
Sun Feb 23 05:57:35 CST 2014


Hi Mani,

thanks for the quick feedback.

 > I tested the updated implementation of the viennacl bindings in
> petsc-dev/next and I get rather poor performance when using viennacl on
> either cpu or gpu. I am using the TS module (type:theta) with a simple
> advection equation in 2D with resolution 256x256 and 8 variables.

Good, this has about 500k unknowns, so OpenCL kernel launch overhead 
should not be a show-stopper.

> I
> tested with the following cases:
>
> 1) Single cpu with petsc's old aij mat and vec implementation
> 2) Viennacl mat and vec and using VecViennaCLGetArrayRead/Write in the
> residual evaluation function on an intel cpu with intel's opencl.
> 3) Viennacl mat and vec and using VecViennaCLGetArrayRead/Write in the
> residual evaluation function on an nvidia gpu.
>
> The first case is the fastest and the other cases are 2-3 times slower.
> Attached are the log summaries for each cases and the code I used to
> test with. I am running using the following command:
>
> time ./petsc_opencl -ts_monitor -snes_monitor -ts_dt 0.01 -ts_max_steps
> 10 -ts_type theta -log_summary

As Matt already noted, the bottleneck here is the frequent copy from/to 
the device. I see 90% of the time spent in MatFDColorApply, so this is 
where we need to look at. Is there any chance you can send me the code 
to reproduce this directly? Is it the same you sent me back in January?

Btw: Mani, does the memory still get filled up on the GPU for larger 
time steps?

Best regards,
Karli




More information about the petsc-dev mailing list