[petsc-dev] Performance of Petsc + ViennaCL 1.5.1 (branch:petsc-dev/next)

Karl Rupp rupp at mcs.anl.gov
Sun Feb 23 16:08:39 CST 2014


Hi Mani,

you're absolutely right, I must have had tomatoes on my eyes... :-/

Thanks and best regards,
Karli


On 02/23/2014 05:48 PM, Mani Chandra wrote:
> I forgot to mention that it is indeed the code I sent in January, but I
> also attached it in the first email in this thread.
>
> Cheers,
> Mani
>
> On Feb 23, 2014 10:42 AM, "Mani Chandra" <mc0710 at gmail.com
> <mailto:mc0710 at gmail.com>> wrote:
>
>     Hi Karl,
>
>     I have attached the code already in my last email. It is the last
>     attachment. The memory leak has been fixed. Thanks!
>
>     Cheers,
>     Mani
>
>     On Feb 23, 2014 5:57 AM, "Karl Rupp" <rupp at mcs.anl.gov
>     <mailto:rupp at mcs.anl.gov>> wrote:
>
>         Hi Mani,
>
>         thanks for the quick feedback.
>
>          > I tested the updated implementation of the viennacl bindings in
>
>             petsc-dev/next and I get rather poor performance when using
>             viennacl on
>             either cpu or gpu. I am using the TS module (type:theta)
>             with a simple
>             advection equation in 2D with resolution 256x256 and 8
>             variables.
>
>
>         Good, this has about 500k unknowns, so OpenCL kernel launch
>         overhead should not be a show-stopper.
>
>             I
>             tested with the following cases:
>
>             1) Single cpu with petsc's old aij mat and vec implementation
>             2) Viennacl mat and vec and using
>             VecViennaCLGetArrayRead/Write in the
>             residual evaluation function on an intel cpu with intel's
>             opencl.
>             3) Viennacl mat and vec and using
>             VecViennaCLGetArrayRead/Write in the
>             residual evaluation function on an nvidia gpu.
>
>             The first case is the fastest and the other cases are 2-3
>             times slower.
>             Attached are the log summaries for each cases and the code I
>             used to
>             test with. I am running using the following command:
>
>             time ./petsc_opencl -ts_monitor -snes_monitor -ts_dt 0.01
>             -ts_max_steps
>             10 -ts_type theta -log_summary
>
>
>         As Matt already noted, the bottleneck here is the frequent copy
>         from/to the device. I see 90% of the time spent in
>         MatFDColorApply, so this is where we need to look at. Is there
>         any chance you can send me the code to reproduce this directly?
>         Is it the same you sent me back in January?
>
>         Btw: Mani, does the memory still get filled up on the GPU for
>         larger time steps?
>
>         Best regards,
>         Karli
>




More information about the petsc-dev mailing list