[petsc-dev] cuSparse vector performance issue

Mark Adams mfadams at lbl.gov
Sat Sep 25 16:44:56 CDT 2021

I am testing my Landau code, which is MPI serial, but with many
independent MPI processes driving each GPU, in an MPI parallel harness code
(Landau ex2).

Vector operations with Kokkos Kernels and cuSparse are about the same (KK
is faster) and a bit expensive with one process / GPU. About the same as my
Jacobian construction, which is expensive but optimized on the GPU.  (I am
using arkimex adaptive TS. I am guessing that it does a lot of vector ops,
because there are a lot.)

With 14 or 15 processes, all doing the same MPI serial problem, cuSparse is
about 2.5x more expensive than KK. KK does degrad by about 15% from the one
processor case. So KK is doing fine, but something bad is happening with

Anyone have any thoughts on this?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210925/bdab3e33/attachment.html>

More information about the petsc-dev mailing list