[petsc-dev] Using PETSC with GPUs

Fri Jan 14 15:43:33 CST 2022

There are a few things:
* GPU have higher latencies and so you basically need a large
enough problem to get GPU speedup
* I assume you are assembling the matrix on the CPU. The copy of data to
the GPU takes time and you really should be creating the matrix on the GPU
* I agree with Barry, Roughly 1M / GPU is around where you start seeing a
win but this depends on a lot of things.
* There are startup costs, like the CPU-GPU copy. It is best to run one
mat-vec, or whatever, push a new stage and then run the benchmark. The
timing for this new stage will be separate in the log view data. Look at
that.
 - You can fake this by running your benchmark many times to amortize any
setup costs.

On Fri, Jan 14, 2022 at 4:27 PM Rohan Yadav <rohany at alumni.cmu.edu> wrote:

> Hi,
>
> I'm looking to use PETSc with GPUs to do some linear algebra operations,
> like SpMV, SPMM etc. Building PETSc with `--with-cuda=1` and running with
> `-mat_type aijcusparse -vec_type cuda` gives me a large slowdown from the
> same code running on the CPU. This is not entirely unexpected, as things
> like data transfer costs across the PCIE might erroneously be included in
> my timing. Are there some examples of benchmarking GPU computations with
> PETSc, or just the proper way to write code in PETSc that will work for
> CPUs and GPUs?
>
> Rohan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220114/e4896eb7/attachment.html>