[petsc-dev] cuSparse vector performance issue
Junchao Zhang
junchao.zhang at gmail.com
Sat Sep 25 19:12:34 CDT 2021
On Sat, Sep 25, 2021 at 4:45 PM Mark Adams <mfadams at lbl.gov> wrote:
> I am testing my Landau code, which is MPI serial, but with many
> independent MPI processes driving each GPU, in an MPI parallel harness code
> (Landau ex2).
>
> Vector operations with Kokkos Kernels and cuSparse are about the same (KK
> is faster) and a bit expensive with one process / GPU. About the same as my
> Jacobian construction, which is expensive but optimized on the GPU. (I am
> using arkimex adaptive TS. I am guessing that it does a lot of vector ops,
> because there are a lot.)
>
> With 14 or 15 processes, all doing the same MPI serial problem, cuSparse
> is about 2.5x more expensive than KK. KK does degrad by about 15% from the
> one processor case. So KK is doing fine, but something bad is
> happening with cuSparse.
>
AIJKOKKOS and AIJCUSPARSE have different algorithms? I don't know. To know
exactly, the best approach is to consult with Peng at nvidia to profile the
code.
>
> Anyone have any thoughts on this?
>
> Thanks,
> Mark
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210925/f348a138/attachment.html>
More information about the petsc-dev
mailing list