[petsc-dev] Performance of Petsc + ViennaCL 1.5.1 (branch:petsc-dev/next)
Karl Rupp
rupp at mcs.anl.gov
Sun Feb 23 05:57:35 CST 2014
Hi Mani,
thanks for the quick feedback.
> I tested the updated implementation of the viennacl bindings in
> petsc-dev/next and I get rather poor performance when using viennacl on
> either cpu or gpu. I am using the TS module (type:theta) with a simple
> advection equation in 2D with resolution 256x256 and 8 variables.
Good, this has about 500k unknowns, so OpenCL kernel launch overhead
should not be a show-stopper.
> I
> tested with the following cases:
>
> 1) Single cpu with petsc's old aij mat and vec implementation
> 2) Viennacl mat and vec and using VecViennaCLGetArrayRead/Write in the
> residual evaluation function on an intel cpu with intel's opencl.
> 3) Viennacl mat and vec and using VecViennaCLGetArrayRead/Write in the
> residual evaluation function on an nvidia gpu.
>
> The first case is the fastest and the other cases are 2-3 times slower.
> Attached are the log summaries for each cases and the code I used to
> test with. I am running using the following command:
>
> time ./petsc_opencl -ts_monitor -snes_monitor -ts_dt 0.01 -ts_max_steps
> 10 -ts_type theta -log_summary
As Matt already noted, the bottleneck here is the frequent copy from/to
the device. I see 90% of the time spent in MatFDColorApply, so this is
where we need to look at. Is there any chance you can send me the code
to reproduce this directly? Is it the same you sent me back in January?
Btw: Mani, does the memory still get filled up on the GPU for larger
time steps?
Best regards,
Karli
More information about the petsc-dev
mailing list